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Abstract 

We  provide  a  model  to  investigate  the  tension  between  information  aggregation  and 
spread  of  misinformation  in  large  societies  (conceptualized  as  networks  of  agents  com- 
municating with  each  other).  Each  individual  holds  a  belief  represented  by  a  scalar. 
Individuals  meet  pairwise  and  exchange  information,  which  is  modeled  as  both  individ- 
uals adopting  the  average  of  their  pre-meeting  beliefs.  When  all  individuals  engage  in 
this  t3'pe  of  information  exchange,  the  society  will  be  able  to  effectively  aggregate  the 
initial  information  held  by  all  individuals.  There  is  also  the  possibility  of  misinformation, 
however,  because  some  of  the  individuals  are  "forceful,"  meaning  that  they  influence  the 
beliefs  of  (some)  of  the  other  individuals  they  meet,  but  do  not  change  their  own  opinion. 
The  paper  characterizes  how  the  presence  of  forceful  agents  interferes  with  information 
aggregation.  Under  the  assumption  that  even  forceful  agents  obtain  some  information 
(however  infrequent)  from  some  others  (and  additional  weak  regularity  conditions),  we 
first  show  that  beliefs  in  this  class  of  societies  converge  to  a  consensus  among  all  in- 
dividuals. This  consensus  value  is  a  random  variable,  however,  and  we  characterize  its 
behavior.  Our  main  results  quantify  the  extent  of  misinformation  in  the  society  by  either 
providing  bounds  or  exact  results  (in  some  special  cases)  on  how  far  the  consensus  value 
can  be  from  the  benchmark  without  forceful  agents  (where  there  is  efficient  information 
aggregation).  The  worst  outcomes  obtain  when  there  are  several  forceful  agents  and 
forceful  agents  themselves  update  their  beliefs  only  on  the  basis  of  information  they 
obtain  from  individuals  most  likely  to  have  received  their  own  information  previously. 

Keywords:  information  aggregation,  learning,  misinformation,  social  networks. 
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1     Introduction 

Individuals  form  beliefs  on  various  economic,  political  and  social  variables  ("state") 
based  on  information  they  receive  from  others,  including  friends,  neighbors  and  cowork- 
ers as  well  as  local  leaders,  news  sources  and  political  actors.  A  key  tradeoff  faced  by 
any  society  is  whether  this  process  of  information  exchange  will  lead  to  the  formation 
of  more  accurate  beliefs  or  to  certain  systematic  biases  and  spread  of  misinformation. 
A  famous  idea  going  back  to  Condorcet's  Jury  Theorem  (now  often  emphasized  in  the 
context  of  ideas  related  to  "wisdom  of  the  crowds")  encapsulates  the  idea  that  exchange 
of  dispersed  information  will  enable  socially  beneficial  aggregation  of  information.  How- 
ever, as  several  examples  ranging  from  the  effects  of  the  Swift  Boat  ads  during  the  2004 
presidential  campaign  to  the  beliefs  in  the  Middle  East  that  9/11  was  a  US  or  Israeli 
conspiracy  illustrate,  in  practice  social  groups  are  often  swayed  by  misleading  ads,  media 
outlets,  and  political  leaders,  and  hold  on  to  incorrect  and  inaccurate  behefs. 

A  central  question  for  social  science  is  to  understand  the  conditions  under  which 
exchange  of  information  will  lead  to  the  spread  of  misinformation  instead  of  aggregation 
of  dispersed  information.  In  this  paper,  we  take  a  first  step  towards  developing  and 
analyzing  a  framework  for  providing  answers  to  this  question.  While  the  issue  of  misin- 
formation can  be  studied  using  Bayesian  models,  non-Bayesian  models  appear  to  provide 
a  more  natural  starting  point.-'  Our  modeling  strategy  is  therefore  to  use  a  non-Bayesian 
model,  which  however  is  reminiscent  of  a  Bayesian  model  in  the  absence  of  "forceful" 
agents  (who  are  either  trying  to  mislead  or  influence  others  or  are,  for  various  rational 
or  irrational  reasons,  not  interested  in  updating  their  opinions). 

W'e  consider  a  society  envisaged  as  a  social  network  of  n  agents,  communicating  and 
exchanging  information.  Specifically,  each  agent  is  interested  in  learning  some  under- 
lying state  6'  G  M  and  receives  a  signal  x, (0)  G  R  in  the  beginning.  We  assume  that 
9  =  l/n^"^j  Xj(0),  so  that  information  about  the  relevant  state  is  dispersed  and  this 
information  can  be  easily  aggregated  if  the  agents  can  communicate  in  a  centralized  or 
decentralized  fashion. 

Information  exchange  between  agents  takes  place  as  follows:  Each  individual  is  "rec- 
ognized" according  to  a  Poisson  process  in  continuous  time  and  conditional  on  this  event, 
meets  one  of  the  individuals  in  her  social  neighborhood  according  to  a  pre-specified 
stochastic  process.  We  think  of  this  stochastic  process  as  representing  an  underlying 
social  network  (for  example,  friendships,  information  networks,  etc.).  Following  this 
meeting,  there  is  a  potential  exchange  of  information  between  the  two  indi\'iduals,  af- 
fecting the  beliefs  of  one  or  both  agents.  We  distinguish  between  two  types  of  individuals: 
regular  or  forceful.  When  two  regular  agents  meet,  they  update  their  beliefs  to  be  equal 
to  the  average  of  their  pre-meeting  beliefs.   This  structure,  tough  non-Bayesian,  has  a 


'In  particular,  uiisinformation  can  arise  in  a  Bayesian  model  if  an  agent  (receiver)  is  unsure  of  the 
type  of  another  agent  (sender)  providing  her  with  information  and  the  sender  happen.s  (o  be  of  a  type 
intending  to  mislead  the  receiver.  Nevertheless,  this  type  of  misinformation  will  be  limited  since  if  the 
probability  that  the  sender  is  of  the  misleading  type  is  high,  the  receiver  will  not  change  her  beliefs 
much  on  the  basis  of  the  sender's  communication. 


simple  and  appealing  interpretation  and  ensures  the  convergence  of  beliefs  to  the  un- 
derlying state  6  when  the  society  consists  only  of  regular  agents."  In  contrast,  when 
an  agent  meets  a  forceful  agent,  this  may  result  in  the  forceful  agent  "influencing"  his 
beliefs  so  that  this  individual  inherits  the  forceful  agent's  belief  except  for  an  e  weight 
on  his  pre-meeting  belief.'^  Our  modeling  of  forceful  agents  is  sufficiently  general  to  nest 
both  individuals  (or  media  outlets)  that  purposefully  wish  to  influence  others  with  their 
opinion  or  individuals  who,  for  various  reasons,  may  have  more  influence  with  some 
subset  of  the  population/  A  key  assumption  of  our  analysis  is  that  even  forceful  agents 
engage  in  some  updating  of  their  beliefs  (even  if  infrequently)  as  a  result  of  exchange 
of  information  with  their  own  social  neighborhoods.  This  assumption  captures  the  in- 
tuitive notion  that  "no  man  is  an  island"  and  thus  receives  some  nontrivial  input  from 
the  social  context  in  which  he  or  she  is  situated.^  The  influence  pattern  of  social  agents 
superimposed  over  the  social  network  can  be  described  by  directed  hnks,  referred  to 
as  forceful  hnks,  and  creates  a  richer  stochastic  process,  representing  the  evolution  of 
beliefs  in  the  society.  Both  with  and  without  forceful  agents,  the  evolution  of  beliefs 
can  be  represented  by  a  Markov  chain  and  our  analysis  will  exploit  this  connection.  We 
will  frequently  distinguish  the  Markov  chain  representing  the  evolution  of  beliefs  and 
the  Markov  chain  induced  by  the  underlying  social  network  (i.e.,  just  corresponding  to 
the  communication  structure  in  the  society,  without  taking  into  account  the  influence 
pattern)  and  properties  of  both  will  play  a  central  role  in  our  results. 

Our  objective  is  to  characterize  the  evolution  of  beliefs  and  quantify  the  effect  of 
forceful  agents  on  public  opinion  in  the  context  of  this  model.  Our  first  result  is  that, 
despite  the  presence  of  forceful  agents,  the  opinion  of  all  agents  in  this  social  network 
converges  to  a  common,  tough  stochastic,  value  under  weak  regularity  conditions.  More 
formally,  each  agent's  opinion  converges  to  a  value  given  by  7r'x(0),  where  a;(0)  is  the 
vector  of  initial  beliefs  and  tt  is  a  random  vector.  Our  measure  of  spread  of  misinforma- 
tion in  the  society  will  be  7f'x(0)  —  9  =  Z]r=i(^i  ~  '^/'n)x,{0),  where  vf  is  the  expected 
value  of  TT  and  Tfj  denotes  its  ith  component.  The  greater  is  this  gap,  the  greater  is 
the  potential  for  misinformation  in  this  society.  Moreover,  this  formula  also  makes  it 
clear  that  yf,  —  l/n  gives  the  excess  influence  of  agent  i.  Our  strategy  will  be  to  develop 


"The  apipealing  interpretation  is  tliat  tliis  tyy^ie  of  averaging  would  Ije  optimal  if  both  agents  had 
beliefs  drawn  from  a  normal  distribution  with  mean  equal  to  the  underlying  state  and  equal  precision. 
This  interpretation  is  discussed  in  detail  in  De  Marzo,  Vayanos,  and  Zwiebel  [16]  in  a  related  context. 

■^When  e  =  1/2,  then  the  individual  treats  the  forceful  agent  just  as  any  otlier  regular  agent  (is 
not  influenced  by  him  over  and  above  the  information  exchange)  and  the  only  difference  from  the 
interaction  between  two  regular  agents  is  that  the  forceful  agent  himself  does  not  update  his  beliefs.  All 
of  our  analysis  is  conducted  for  arbitrary  e,  so  whether  forceful  agents  are  also  "influential''  in  pairwise 
meetings  is  not  important  for  any  of  our  findings. 

''What  we  do  not  allow  are  individuals  who  know  the  underlying  state  and  try  to  convince  others  of 
some  systematic  bias  relative  to  the  underlying  state,  though  the  model  could  be  modified  to  fit  this 
possibility  as  well. 

■^When  there  are  several  forceful  ageiu.s  and  none  of  them  ever  change  their  opinion,  then  it  is 
straightforward  to  see  that  opinions  in  this  society  will  never  settle  into  a  ''stationary''  distribution. 
While  this  case  is  also  interesting  to  study,  it  is  significantly  more  difficult  to  analyze  and  requires  a 
different  mathematical  approach. 


bounds  on  the  spread  of  misinformation  in  the  society  (as  defined  above)  and  on  the 
excess  influence  of  each  agent  for  general  social  networks  and  also  provide  exact  results 
for  some  special  networks. 

We  provide  three  types  of  results.  First,  using  tools  from  matrix  perturbation  the- 
ory,^ we  provide  global  and  general  upper  bounds  on  the  extent  of  misinformation  as  a 
function  of  the  properties  of  the  underlying  social  network.  In  particular,  the  bounds 
relate  to  the  spectral  gap  and  the  mixing  properties  of  the  Markov  chain  induced  by  the 
social  network.  Recall  that  a  Markov  chain  is  fast-mixing  if  it  converges  rapidly  to  its 
stationary  distribution.  It  will  do  so  when  it  has  a  large  spectral  gap,  or  loosely  speak- 
ing, when  it  is  highly  connected  and  possesses  many  potential  paths  of  communication 
between  any  pair  of  agents.  Intuitively,  societies  represented  by  fast-mixing  Markov 
chains  have  more  limited  room  for  misinformation  because  forceful  agents  themselves 
are  influenced  by  the  weighted  opinion  of  the  rest  of  the  society  before  they  can  spread 
their  own  (potentially  extreme)  views.  A  corollary  of  these  results  is  that  for  a  spe- 
cial class  of  societies,  corresponding  to  "expander  graphs",  misinformation  disappears 
in  large  societies  provided  that  there  is  a  finite  number  of  forceful  agents  and  no  forceful 
agent  has  global  impact.^  In  contrast,  the  extent  of  misinformation  can  be  substantial 
in  slow-mixing  Markov  chains,  also  for  an  intuitive  reason.  Societies  represented  by 
such  Markov  chains  would  have  a  high  degree  of  partitioning  (multiple  clusters  with 
weak  communication  in  between),  so  that  forceful  agents  receive  their  information  from 
others  who  previously  were  influenced  by  them,  ensuring  that  their  potentially  extreme 
opinions  are  never  moderated.® 

Our  second  set  of  results  exploit  the  local  structure  of  the  social  network  in  the 
neighborhood  of  each  forceful  agent  in  order  to  provide  a  tighter  characterization  of  the 
extent  of  misinformation  and  excess  influence.  Fast-mixing  and  spectral  gap  properties 
are  global  (and  refer  to  the  properties  of  the  overall  social  network  representing  meeting 
and  communication  patterns  among  all  agents).  As  such,  they  may  reflect  properties  of 
a  social  network  far  from  where  the  forceful  agents  are  located.  If  so,  our  first  set  of 
bounds  will  not  be  tight.  To  redress  this  problem,  we  develop  an  alternative  analysis 
using  mean  (first)  passage  times  of  the  Markov  chain  and  show  how  it  is  not  only  the 
global  properties  of  the  social  network,  but  also  the  local  social  context  in  which  forceful 
agents  are  situated  that  matter.  For  example,  in  a  social  network  with  a  single  dense 
cluster  and  several  non-clustered  pockets,  it  matters  greatly  whether  forceful  links  are 
located  inside  the  cluster  or  not.     We  illustrate  this  result  sharply  by  first  focusing 


''In  partirular,  we  ck;compose  the  transiiion  matrix  of  the  Marko\'  chain  into  a  dotil")ly  stochastic 
matrix,  representing  the  underl^'ing  social  networlc,  and  a  remainder  matrix,  representing  a  directed 
influence  graph.  Despite  the  term  "pert\irl')ation,"  this  remainder  matrix  need  not  be  ''small"  in  any 
sense. 

^Expander  graphs  are  graphs  whose  spectral  gap  remains  bounded  away  from  zero  as  the  number 
of  nodes  tends  to  infinity.  Several  networks  related  to  the  Internet  corre.spond  to  expander  graphs;  see, 
for  example,  Mihail,  Papa.dimitriou.  and  Saberi  [27]. 

*This  result  is  related  to  Golub  and  Jackson  [20],  where  they  relate  learning  to  homophily  properties 
of  the  social  network. 


on  graphs  with  forceful  essential  edges,  that  is,  graphs  representing  societies  in  which 
a  single  forceful  link  connects  two  otherwise  disconnected  components.  This,  loosely- 
speaking,  represents  a  situation  in  which  a  forceful  agent,  for  example  a  media  outlet  or 
a  pohtical  party  leader,  obtains  all  of  its  (or  his  or  her)  information  from  a  small  group 
of  individuals  and  influences  the  rest  of  the  society.  In  this  context,  we  establish  the 
surprising  result  that  all  members  of  the  small  group  will  have  the  same  excess  influence, 
even  though  some  of  them  may  have  much  weaker  links  or  no  links  to  the  forceful  agent. 
This  result  is  an  implication  of  the  society  having  a  (single)  forceful  essential  edge  and 
reflects  the  fact  that  the  information  among  the  small  group  of  individuals  who  are  the 
source  of  information  of  the  forceful  agent  aggregates  rapidly  and  thus  it  is  the  average 
of  their  beliefs  that  matter.  We  then  generalize  these  results  and  intuitions  to  more 
general  graphs  using  the  notion  of  information  bottlenecks. 

Our  third  set  of  results  are  more  technical  in  nature,  and  provide  new  conceptual 
tools  and  algorithms  for  characterizing  the  role  of  information  bottlenecks.  In  particular, 
we  introduce  the  concept  of  relative  cuts  and  present  several  new  results  related  to 
relative  cuts  and  how  these  relate  to  mean  first  passage  times.  For  our  purposes,  these 
new  results  are  useful  because  they  enable  us  to  quantify  the  extent  of  local  clustering 
around  forceful  agents.  Using  the  notion  of  relative  cuts,  we  develop  new  algorithms 
based  on  graph  clustering  that  enable  us  to  provide  improved  bounds  on  the  extent  of 
misinformation  in  beliefs  as  a  function  of  information  bottlenecks  in  the  social  network. 

Our  paper  is  related  to  a  large  and  growing  learning  literature.  Much  of  this  literature 
focuses  on  various  Bayesian  models  of  observational  or  communication-based  learning; 
for  example  Bikchandani,  Hirshleifer  and  Welch  [8],  Banerjee  [6],  Smith  and  Sorensen 
[36],  [35],  Banerjee  and  Fudenberg  [7],  Bala  and  Goyal  [4],  [5],  Gale  and  Kariv  [18],  and 
Celen  and  Kariv  [12],  [11].  These  papers  develop  models  of  social  learning  either  using 
a  Bayesian  perspective  or  exploiting  some  plausible  rule-of-thumb  behavior.  Acemoglu, 
Dahleh,  Lobel  and  Ozdaglar  [1]  provide  an  analysis  of  Bayesian  learning  over  general 
social  networks.  Our  paper  is  most  closely  related  to  DeGroot  [15],  DeMarzo,  Vayanos 
and  Zwiebel  [16]  and  Golub  and  Jackson  [21],  [20],  who  also  consider  non-Bayesian 
learning  over  a  social  network  represented  by  a  connected  graph. ^  None  of  the  papers 
mentioned  above  consider  the  issue  of  the  spread  of  misinformation  (or  the  tension 
between  aggregation  of  information  and  spread  of  misinformation),  though  there  are 
close  parallels  between  Golub  and  Jackson's  and  our  characterizations  of  influence.""^  In 


^An  important  distinction  is  that  in  contrast  to  the  ''averaging"  model  used  in  these  papers,  we 
have  a  model  of  pairwise  interactions.  We  believe  that  this  model  has  a  more  attractive  economic 
interpretation,  since  it  does  not  have  the  feature  that  neighbors'  information  will  be  averaged  at  each 
date  (even  though  the  same  information  was  exchanged  the  previous  period).  In  contrast,  in  the  pairwise 
meeting  model  (without  forceful  agents),  if  a  pair  meets  two  periods  in  a  row.  in  the  second  meeting 
there  is  no  information  to  exchange  and  no  change  in  beliefs  takes  place. 

^°In  particular,  Golub  and  Jackson  [20]  characterize  the  effects  of  homophily  on  learning  and  influence 
in  two  different  models  of  learning  in  terms  of  mixing  properties  and  the  spectral  gap  of  gra.phs.  In  one 
of  their  learning  models,  which  builds  on  DeGroot  [1.5],  DeMarzo,  Vayanos  and  Zwiebel  [16]  and  Golub 
and  Jackson  [21],  homophil}'  has  negative  effects  on  learning  (and  speed  of  learning)  for  reasons  related 
to  our  finding  that  in  slow-mixing  graphs,  misinformation  can  spread  more. 


addition  to  our  focus,  the  methods  of  analj'sis  here,  which  develop  bounds  on  the  extent 
of  misinformation  and  provide  exact  characterization  of  excess  influence  in  certain  classes 
of  social  networks,  are  entirely  new  in  the  literature  and  also  rely  on  the  developments 
of  new  results  in  the  analysis  of  Markov  chains. 

Our  work  is  also  related  to  other  work  in  the  economics  of  communication,  in  par- 
ticular, to  cheap-talk  models  based  on  Crawford  and  Sobel  [14]  (see  also  Farrell  and 
Gibbons  [17]  and  Sobel  [37]),  and  some  recent  learning  papers  incorporating  cheap-talk 
games  into  a  network  structure  (see  Ambrus  and  Takahashi  [3],  Hagenbach  and  Koessler 
[22],  and  Galeotti,  Ghiglino  and  Squintani  [19]). 

In  addition  to  the  papers  on  learning  mentioned  above,  our  paper  is  related  to  work 
on  consensus,  which  is  motivated  by  different  problems,  but  typically  leads  to  a  similar 
mathematical  formulation  (Tsitsiklis  [38],  Tsitsiklis,  Bertsekas  and  Athans  [39],  Jad- 
babaie,  Lin  and  Morse  [25],  Olfati-Saber  and  Murray  [29],  Olshevsky  and  Tsitsiklis  [30], 
Nedic  and  Ozdaglar  [28]).  In  consensus  problems,  the  focus  is  on  whether  the  beliefs 
or  the  values  held  by  different  units  (which  might  correspond  to  individuals,  sensors  or 
distributed  processors)  converge  to  a  common  value.  Our  analysis  here  does  not  only 
focus  on  consensus,  but  also  whether  the  consensus  happens  around  the  true  value  of 
the  underlying  state.  There  are  also  no  parallels  in  this  literature  to  our  bounds  on 
misinformation  and  characterization  results. 

The  rest  of  this  paper  is  organized  as  follows:  In  Section  2,  we  introduce  our  model  of 
interaction  between  the  agents  and  describe  the  resulting  evolution  of  individual  beliefs. 
We  also  state  our  assumptions  on  connectivity  and  information  exchange  between  the 
agents.  Section  3  presents  our  main  convergence  result  on  the  evolution  of  agent  beliefs 
over  time.  In  Section  4,  we  provide  bounds  on  the  extent  of  misinformation  as  a  function 
of  the  global  network  parameters.  Section  5  focuses  on  the  effects  of  location  of  forceful 
links  on  the  spread  of  misinformation  and  provides  bounds  as  a  function  of  the  local 
connectivity  and  location  of  forceful  agents  in  the  network.  Section  6  contains  our 
concluding  remarks. 

Notation  and  Terminology:  A  vector  is  viewed  as  a  column  vector,  unless  clearly 
stated  otherwise.  We  denote  by  x,  or  [x]^  the  i"^  component  of  a  vector  x.  When  x,  >  0 
for  all  components  i  of  a  vector  x,  we  write  x  >  0.  For  a  matrix  A,  we  write  Aij  or  {A]tj 
to  denote  the  matrix  entry  in  the  i"'  row  and  j""  column.  We  write  x'  to  denote  the 
transpose  of  a  vector  x.  The  scalar  product  of  two  vectors  x,y  G  R™  is  denoted  by  x'y. 
We  use  Ijxlio  to  denote  the  standard  Euclidean  norm,  ||x||2  =:  V-'c'x.  We  write  l|x||oo  to 
denote  the  max  norm,  \\x\\r^  =  maxi<j<,n  |x,,|.  We  use  e^  to  denote  the  vector  with  i"' 
entry  equal  to  1  and  all  other  entries  equal  to  0.  We  denote  by  e  the  vector  with  all 
entries  equal  to  1. 

A  vector  a  is  said  to  be  a  stochastic  vector  when  a,  >  0  for  all  t  and  7],  a,  =  1. 
A  square  matrix  A  is  said  to  be  a  (row)  stochastic  matrix  when  each  row  of  /I  is  a 
stochastic  vector.  The  transpose  of  a  matrix  A  is  denoted  by  A' .  A  square  matrix  A  is 
said  to  be  a  doubly  stochastic  matrix  when  both  A  and  A'  are  stochastic  matrices. 


2     Belief  Evolution 

2.1      Description  of  the  Environment 

We  consider  a  set  N  =  {1, . . .  ,n}  of  agents  interacting  over  a  social  network.  Each 
agent  i  starts  with  an  initial  belief  about  an  underlying  state,  which  we  denote  by 
Xi(0)  G  M.  Agents  exchange  information  with  their  neighbors  and  update  their  beliefs. 
We  assume  that  there  are  two  types  of  agents;  regular  and  forceful.  Regular  agents 
exchange  information  with  their  neighbors  (when  they  meet).  In  contrast,  forceful  agents 
influence  others  disproportionately. 

We  use  an  asynchronous  continuous-time  model  to  represent  meetings  between  agents 
(also  studied  in  Boyd  et  al.  [9]  in  the  context  of  communication  networks).  In  particular, 
we  assume  that  each  agent  meets  (communicates  with)  other  agents  at  instances  defined 
by  a  rate  one  Poisson  process  independent  of  other  agents.  This  implies  that  the  meeting 
instances  (over  all  agents)  occur  according  to  a  rate  n  Poisson  process  at  times  tk,k  >  1. 
Note  that  in  this  model,  by  convention,  at  most  one  node  is  active  (i.e.,  is  meeting 
another)  at  a  given  time.  We  discretize  time  according  to  meeting  instances  (since  these 
are  the  relevant  instances  at  which  the  beliefs  change),  and  refer  to  the  interval  [ifc,i/c+i) 
as  the  /c"*  tivae  slot.  On  average,  there  are  n  meeting  instances  per  unit  of  absolute  time 
(see  Boyd  et  al.  [9]  for  a  precise  relation  between  these  instances  and  absolute  time). 
Suppose  that  at  time  (slot)  /c,  agent  i  is  chosen  to  meet  another  agent  (probability 
1/n).  In  this  case,  agent  i  will  meet  agent  j  G  A/"  with  probability  p,j.  Following  a 
meeting  between  i  and  j,  there  is  a  potential  exchange  of  information.  Throughout, 
we  assume  that  all  events  that  happen  in  a  meeting  are  independent  of  any  other  event 
that  happened  in  the  past.  Let  x,:(fc)  denote  the  belief  of  agent  i  about  the  underlying 
state  at  time  k.  The  agents  update  their  beliefs  according  to  one  of  the  following  three 
possibilities. 

(i)  Agents  i  and  j  reach  pairwise  consensus  and  tlie  beliefs  are  updated  according  to 

■■■•■       ■'         ^^lk  +  l)  =  x,{k  +  l)  =  ^^^^^^^^^. 
We  denote  the  conditional  probability  of  this  event  (conditional  on  i  meeting  j) 

(ii)  Agent  j  influences  agent  ?',  in  which  case  for  some  e  G   (0,1/2],  behefs  change 
according  to 

x,{k  +  1)  =  ex,{k)  +  (1  -  t)xj{k),      and     Xj{k  +  1)  =  Xj{k).       '     '   (1) 

In  this  case  beliefs  of  agent  j  do  not -change. ^^   We  denote  the  conditional  prob- 
ability of  this  event  as  Q',j,  and  refer  to  it  as  the  influence  probability.   Note  that 


■'^We  could  allow  the  self  belief  weight  e  to  be  different  for  each  agent  i.  This  generality  does  not 
change  the  results  or  the  economic  intuitions,  so  for  notational  convenience,  we  assume  this  weight  to 
be  the  same  across  all  agents. 


V. 


e  allow  e  =  1/2,  so  that  agent  i  may  be  treating  agent  j  just  as  a  regular  agent, 
except  that  agent  j  himself  does  not  change  his  beliefs. 

(iii)  Agents  i  and  j  do  not  agree  and  stick  to  their  behefs,  i.e., 

x,:(/c+  1)  =  Xi{k),      and     Xj{k  +  1)  =  Xj{k). 

This  event  has  probabilit}'  7,,^  =  1  —  /5,j  —  a^j.  ■■■'.- 

Any  agent  j  for  whom  the  influence  probability  a-ij  >  0  for  some  i  €.  Af  is  referred 
.to  as  &  forceful  agent.  Moreover,  the  directed  link  (jj)  is  referred  to  as  a.  forceful  Imk}"^ 

As  discussed  in  the  introduction,  we  can  interpret  forceful  agents  in  multiple  different 
ways.  First,  forceful  agents  may  correspond  to  community  leaders  or  news  media,  will 
have  a  disproportionate  eff'ect  on  the  beliefs  of  their  followers.  In  such  cases,  it  is  natural 
to  consider  e  small  and  the  leaders  or  media  not  updating  their  own  beliefs  as  a  result 
of  others  listening  to  their  opinion.  Second,  forceful  agents  may  be  indistinguishable 
from  regular  agents,  and  thus  regular  agents  engage  in  what  they  think  is  information 
exchange,  but  forceful  agents,  because  of  stubbornness  or  some  other  motive,  do  not 
incorporate  the  information  of  these  agents  in  their  own  beliefs.  In  this  case,  it  may  be 
natural  to  think  of  t  a.s  equal  to  1/2.  The  results  that  follow  remain  valid  with  either 
interpretation. 

The  influence  structure  described  above  will  determine  the  evolution  of  beliefs  in 
the  society.  Below,  we  will  give  a  more  precise  separation  of  this  evolution  into  two 
components,  one  related  to  the  underlying  social  network  (communication  and  meeting 
structure),  and  the  other  to  influence  patterns. 

2.2     Assumptions 

We  next  state  our  assumptions  on  the  belief  evolution  model  among  the  agents.    We 
have  the  following  assumption  on  the  agent  meeting  probabilities  pi-j. 

Assumption  1.   (Meetmg  Probabilities) 

(a)  For  all  i,  the  probabilities  p„  are  equal  to  0. 

(b)  For  all  ?,  the  probabilities  p^j  are  nonnegative  for  all  j  and  they  sum  to  1  over  j, 


I.e. 


Pij  >  0     for  all  i,j,  z_.Pi]  —  ^     fo^"  ^1' 


-'We  refer  t.o  directed  link.s/edges  as  links  and  undireeCed  ones  as  edges. 


Assumption  1(a)  imposes  that  "self-communication"  is  not  a  possibility,  though  this 
is  just  a  convention,  since,  as  stated  above,  we  allow  disagreement  among  agents,  i.e., 
7ij  can  be  positive.  We  let  P  denote  the  matrix  with  entries  p^.  Under  Assumption 
1(b),  the  matrix.  P  is  a  stochastic  matrix}^ 

We  next  impose  a  connectivity  assumption  on  the  social  network.  This  assumption  is 
stated  in  terms  of  the  directed  graph  (A^,  £),  where  £  is  the  set  of  directed  links  induced 
by  the  positive  meeting  probabilities  pjj,  i.e., 

£  =  {{hj)  \Pr:>0}.  (2) 

Assumption  2.   (Connectivity)  The  graph  {J\f,£)  is  strongly  connected,  i.e.,  for  all 
i,j  G  M,  there  exists  a  directed  path  connecting  •/.  to  j  with  links  in  the  set  £. 

Assumption  2  ensures  that  every  agent  "communicates"  with  every  other  agent  (pos- 
sibly through  multiple  links).  This  is  not  an  innocuous  assumption,  since  otherwise  the 
graph  {J\f,  £)  (and  the  society  that  it  represents)  would  segment  into  multiple  non- 
communicating  parts.  Though  not  innocuous,  this  assumption  is  also  natural  for  several 
reasons.  First,  the  evidence  suggests  that  most  subsets  of  the  society  are  not  only  con- 
nected, but  are  connected  by  means  of  several  links  (e.g.,  Watts  [40]  and  Jackson  [24]), 
and  the  same  seems  to  be  true  for  indirect  linkages  via  the  Internet.  Second,  if  the  soci- 
ety is  segmented  into  multiple  non-communication  parts,  the  insights  here  would  apply, 
with  some  modifications,  to  each  of  these  parts. 

Let  us  also  use  dij  to  denote  the  length  of  the  shortest  path  from  i  to  j  and  d  to 
denote  the  maximum  shortest  path  length  between  any  i,j  G  A^,  i.e., 

d  =  max  d,,.  (3) 

In  view  of  Assumption  2,  these  are  all  well-defined  objects. 

Finally,  we  introduce  the  following  assumption  which  ensures  that  there  is  positive 
probability  that  every  agent  (even  if  he  is  forceful)  receives  some  information  from  an 
agent  in  his  neighborhood. 

Assumption  3.  (Interaction  Probabilities)  For  all  (i,  j)  G  £,  the  sum  of  the  averaging 
probability  P^j  and  the  influence  probability  a^j  is  positive,  i.e., 

Aj  +  ctij  >  0         for  all  {i,j)  G  <£". 


The  connecti^■ity  assumption  (Assumption  2)  ensures  that  there  is  a  path  from  any 
forceful  agent  to  other  agents  in  the  network,  implying  that  for  any  forceful  agent  /,  there 
is  a  link  (z,  j)  G  £  for  some  j  G  A/".  Then  the  main  role  of  Assumption  3  is  to  guarantee 
that  even  the  forceful  agents  at  some  point  get  information  from  the  other  agents  in 
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That  is.  its  row  sums  are  equal  to  1. 


the  network.'^  This  assumption  captures  the  idea  that  "no  man  is  an  island,"  i.e.,  even 
the  behefs  of  forceful  agents  are  affected  by  the  beliefs  of  the  society.  In  the  absence  of 
this  assumption,  any  society  consisting  of  several  forceful  agents  may  never  settle  into 
a  stationary  distribution  of  beliefs.  While  this  is  an  interesting  situation  to  investigate, 
it  requires  a  very  different  approach.  Since  we  view  the  "no  man  is  an  island"  feature 
plausible,  we  find  Assumption  3  a  useful  starting  point.  .,    , 

Throughout  the  rest  of  the  paper,  we  assume  that  Assumptions  1,  2,  and  3  hold. 

2.3     Evolution  of  Beliefs:   Social  Network  and  Influence  Matri- 
ces 

We  can  express  the  preceding  belief  update  model  compactly  as  follows.  Let  x{k)  — 
{xi{k), . . .  ,Xn{k))  denote  the  vector  of  agent  beliefs  at  time  k.  The  agent  beliefs  are 
updated  according  to  the  relation 

x{k  +  l)  =  W(k)x{k),  '  (4) 

where  W{k)  is  a  random  matrix  given  by 

{Ajj  =  I  -   ^'"^^  2'^'''^^  "^^^^  probability  Pij/S^j/n, 

Jij  =  I  -  [I  -  e)ej(e,  —  Cj)'    with  probabihty  Pijaij/n,  (5) 

/  with  probability  p,j7jj/n, 

for  all  i.j  G  A^".   The  preceding  belief  update  model  implies  that  the  matrix  W{k)  is  a 
stochastic  matrix  for  all  fc,  and  is  independent  and  identically  distributed  over  all  k. 
Let  us  introduce  the  matrices 

$(/,:,  s)  =  W{k)\V{k  -  1)  ■  •  ■  W{s  +  l)W{s)         for  all  k  and  s  with   k  >  s,       (6) 

with  ^{k,k)  —  \V[k)  for  all  k.  We  will  refer  to  the  matrices  <J>(/c,s)  as  the  transition 
matrices.  We  can  now  write  the  belief  update  rule  (4)  as  follows;  for  all  s  and  k  with 
k  >  s  >  0  and  all  agents  j  G  {1, . .  . ,  n}, 

n 

x,{k  +  l)  =  J2mk,s)],,Xj{s).  (7) 

■    ■  j=i 

Given  our  assumptions,  the  random  matrix  W{k)  is  identically  distributed  over  all 
/c,  and  thus  we  have  for  some  nonnegative  matrix  1^, 

E[W{k)]  =  W         for  all  A-  >  0.  (8) 


^"'Tliis  assumption  is  stated  lor  all  {i.,j}  6  £,  thus  fi  Forceful  agent  i  receives  souie  iiiforuiatioii  from 
any  j  in  his  '"neighborhood".  This  is  without  any  loss  of  generahty,  since  wc  can  always  set  p^  =  0  for 
those  j's  that  are  in  z's  neighl:)orhood  but  from  whom  i  never  obtains  information. 
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The  matrix,  W,  which  we  refer  to  as  the  mean  interaction  matrix,  represents  the  evolu- 
tion of  beliefs  in  the  society.  It  incorporates  elements  from  both  the  underlying  social 
network  (which  determines  the  meeting  patterns)  and  the  influence  structure.  In  what 
follows,  it  will  be  useful  to  separate  these  into  two  components,  both  for  our  mathemat- 
ical analysis  and  to  clarify  the  intuitions.  For  this  purpose,  let  us  use  the  belief  update 
model  (4)-(5)  and  write  the  mean  interaction  matrix  W  as  follows:-'^ 


n 


where  .4,j  and  J,:j  are  matrices  defined  in  Eq.  (5),  and  the  second  inequality  follows  from 
the  fact  that  /3jj  =  1  -  Q'ij  —  7,^  for  all  i,  j  G  J\f.  We  use  the  notation 

to  write  the  mean  interaction  matrix,  W,  as 

W  =  T  +  D.  (10) 

Here,  the  matrix  T  only  depends  on  meeting  probabilities  (matrix  P)  except  that 
it  also  incorporates  ')jj  (probability  that  following  a  meeting  no  exchange  takes  place). 
We  can  therefore  think  of  the  matrix  T  as  representing  the  underlying  social  network 
(friendships,  communication  among  coworkers,  decisions  about  which  news  outlets  to 
watch,  etc.),  and  refer  to  it  as  the  social  network  m.atrix.  It  will  be  useful  below  to 
represent  the  social  interactions  using  an  undirected  (and  weighted)  graph  induced  by 
the  social  network  matrix  T.  This  graph  is  given  by  (A^,  ^),  where  A  is  the  set  of 
undirected  edges  given  by  ,  ,  ■  ■ 

^={{i,j}|7^,  >o},       ;    ..,    ;         :  ^  (11) 

and  the  weight  Ws  of  edge  e  =  {i,j]  is  given  by  the  entry  T,j  =  Tj^  of  the  matrix  T.  We 
refer  to  this  graph  as  the  social  network  graph. 

The  matrix  D,  on  the  other  hand,  can  be  thought  of  as  representing  the  influence 
structure  in  the  society.  It  incorporates  information  about  which  individuals  and  links 
are  forceful  (i.e.,  which  types  of  interactions  will  lead  to  one  individual  influencing  the 
other  without  updating  his  own  beliefs).  We  refer  to  matrix  D  as  the  influence  matrix. 
It  is  also  useful  to  note  for  interpreting  the  mathematical  results  below  that  T  is  a 
doubly  stochastic  matrix,  while  D  is  not.  Therefore,  Eq.  (10)  gives  a  decomposition  of 
the  mean  connectivity  matrix  W  into  a  doubly  stochastic  and  a  remainder  component, 
and  enables  us  to  use  tools  from  matrix  perturbation  theory  (see  Section  4). 


^In  the  sequel,  the  notation  ^,     will  be  tiseci  to  denote  the  double  sum  Yl^j=\  Yl,",=\- 
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3      Convergence 

In  this  section,  we  provide  our  main  convergence  result.  In  particular,  we  show  that 
despite  the  presence  of  forceful  agents,  with  potentially  very  different  opinions  at  the 
beginning,  the  society  will  ultimately  converge  to  a  consensus,  in  which  all  individuals 
share  the  same  belief.  This  consensus  value  of  beliefs  itself  is  a  random  variable.  We 
also  provide  a  first  characterization  of  the  expected  value  of  this  consensus  belief  in 
terms  of  the  mean  interaction  matrix  (and  thus  social  network  and  influence  matrices). 
Our  analysis  essentially  relies  on  showing  that  iterates  of  Eq.  (4),  x{k.),  converge  to  a 
consensus  with  probabiUty  one,  i.e.,  x{k)  — >  .xe,  where  x  is  a  scalar  random  variable 
that  depends  on  the  initial  beliefs  and  the  random  sequence  of  matrices  {W{k)},  and  e 
is  the  vector  of  all  one's.  The  proof  uses  two  lemmas  which  are  presented  in  Appendix 
B. 

Theorem  1.  The  sequences  {xj(/c)},  2  G  A/",  generated  by  Eq.  (4)  converge  to  a  consen- 
sus belief,  i.e.,  there  exists  a  scalar  random  variable  x  such  that 

lim  x,{k)  —  X         for  all  i  with  probability  one. 

k—'oo 

Moreover,  the  random  variable  x  is  a  convex  combination  of  initial  agent  beliefs,  i.e., 

n 

where  tt  =  [vri, .  .  .  ,  7r„]  is  a  random  vector  that  satisfies  vr^  >  0  for  all  j,  and  X]"  j  tt^  =  1. 
Proof.  By  Lemma  9  from  Appendix  B,  we  have 

Pl[^{s  +  n^d-  l,s)]y  >  ^e"'-\  for  all  z,j|  >  (  ^  J  for  all  s  >  0, 

where  $(s  +  n'^d  —  l,s)  is  a  transition  matrix  [cf.  Eq.  (6)],  d  is  the  maximum  shortest 
path  length  in  graph  {Af,S)  [cf.  Eq.  (3)],  e  is  the  self  belief  weight  against  a  forceful 
agent  [cf.  Eq.  (1)],  and  77  is  a  positive  scalar  defined  in  Eq.  (45).  This  relation  implies 
that  over  a  window  of  length  n~d,  all  entries  of  the  transition  matrix  <I>(s  +  n~d  —  1,  s) 
are  strictly  positive  with  positive  probability,  which  is  uniformly  bounded  awa,y  from  0. 
Thus,  we  can  use  Lemma  6  (from  Appendix  A)  with  the  identifications 

H{k)  =  W{k),         B  =  n-d,         9  ="]-€"' -\ 


Letting 


M[k)  =  maxx,(/i:),  m.{k)  =  minx,(/.:) 
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n't     ^2 


this  implies  that  n^e"    ^  <  1  and  for  all  s  >  0, 


Phl{s  +  n^d)  -m{s  +  n'^d)  <  (1  -  717772 e"'~^)(Af(s)  -  m(s))}  > 


d\  » 


Moreover,  by  the  stochasticity  of  the  matrix  \V{k),  it  follows  that  the  sequence  {M{k)  — 
m{k)}  is  nonincreasing  with  probability  one.  Hence,  we  have  for  all  s  >  0 


E 


M{s+rrd)-m.{s+rrd) 


< 


+  ly)      (l-n7?72e— ') 


{M{s)-mis)), 


from  which,  for  any  /c  >  0,  we  obtain 


M{k)  -  m{k) 


This  implies  that 


< 


2  2 

V    \  I  v 


1  -  n'q''/2  e 


Tl/-l^ 


L;;t3J 


(M(0)  -  7n(0)). 


lim  M(^')  —  m{k)  =  0         with  probability  one. 


k—*oo 


The  stochasticity  of  the  matrix  W{k)  further  implies  that  the  sequences  {M{k)}  and 
{m.{k)}  are  bounded  and  monotone  and  therefore  converges  to  the  same  limit,  which  we 
denote  by  x.  Since  we  have 


it  follows  that 


m{k)  <  x,{k)  <  M{k)         for  all  i  smd  k  >  0, 


lim  Xi{k)  =  X         for  all  i  with  probability  one, 

/c— 'OO 


establishing  the  first  result.  '      . 

Letting  s  =  0  in  Eq.  (7),  we  have  for  all  i 

n 

x,{k)  =  J][$(fc  -  1,  0)],j  Xj(0)         for  all  k  >  0. 
From  the  previous  part,  for  any  initial  belief  vector  x(0),  the  limit 


:i2) 


:    lim  Xi{k)  =  y  lim[$(A;-  l,0)],jXj(0) 


exists  and  is  independent  of  i.  Hence,  for  any  h,  we  can  choose  x[0)  =  e^,  i.e.,  x/i(0)  =  1 
and  Xj{0)  =  0  for  all  j  ^  h,  implying  that  the  limit 

lim  [*(^^- 1,0)],, 

fc— »oo 

exists  and  is  independent  of  ?'.  Denoting  this  limit  by  tt/,  and  using  Eq.  (12),  we  obtain 
the  desired  result,  where  the  properties  of  the  vector  jr  =  [tti,  .  .  . ,  7r„]  follows  from 
the  stochasticity  of  matrix  $(/c,  0)  for  all  k  (implying  the  stochasticity  of  its  limit  as 
k  -^  00).     ,  .    '  ^  '  D 
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The  key  implication  of  this  result  is  that,  despite  the  presence  of  forceful  agents, 
the  society  will  ultimately  reach  a  consensus.  Though  surprising  at  first,  this  result  is 
intuitive  in  light  of  our  "no  man  is  an  island"  assumption  (Assumption  3).  However,  in 
contrast  to  "averaging  models"  used  both  in  the  engineering  literature  and  recently  in 
the  learning  literature,  the  consensus  value  here  is  a  random  variable  and  will  depend 
on  the  order  in  which  meetings  have  taken  place.  The  main  role  of  this  result  for  us 
is  that  we  can  now  conduct  our  analysis  on  quantifying  the  extent  of  the  spread  of 
misinformation  by  looking  at  this  consensus  value  of  beliefs. 

The  next  theorem  characterizes  E{x]  in  terms  of  the  limiting  behavior  of  the  matrices 
W''  as  k  goes  to  infinity. 

Theorem  2.  Let  i  be  the  limiting  random  variable  of  the  sequences  {xj(/c)},  i  G  A^ 
generated  by  Eq.  (4)  (cf.  Theorem  1).  Then  we  have: 

(a)  The  matrix  14'''"  converges  to  a  stochastic  matrix  with  identical  rows  n  as  k  goes 
to  inlinity,  i.e., 

lira  W''  =  en. 

k  — 'oc 

(b)  The  expected  value  of  x  is  given  by  a  convex  combination  of  the  initial  agent  values 
x,(0),  where  the  weights  are  given  by  the  components  of  the  probability  vector  yf, 
i.e., 

71 

£[X]   =   ^7f,x,(0)   =7f'x(0). 
1=1 

Proof,   (a)   This  part  relies  on  the  properties  of  the  mean  interaction  matrix  established 

in  Appendix  B.  In  particular,  by  Lemma  7(a),  the  mean  interaction  matrix  W  is  a 

primitive  matrix.  Therefore,  the  Markov  Chain  with  transition  probability  matrix  W  is 

regular  (see  Section  4.1  for  a  definition).  The  result  follows  immediately  from  Theorem 

3(a). 

(b)    From  Eq.  (7).  we  have  for  all  A:  >  0 

x{k)  =  ^{k-  1,0)t(0). 

Moreover,  since  x{k)  -^  xe  as  k  ^  oo,  we  have 

E{xe]  =  E[\im  x{k)]  ^  lim  E[x{k)], 

where  the  second  equality  follows  from  the  Lebesgue's  Dominated  Convergence  Theorem 
(see  [31]).  Combining  the  preceding  two  relations  and  using  the  assumption  that  the 
matrices  W(k)  are  independent  and  identically  distributed  over  all  k  >  0,  we  obtain 

^[.re]  =  lira  E['^{k  -  l,0)x(0)]  =  lira  W''x{0). 

k — 'OO  /c— *oC' 

which  in  view  of  part  (a)  implies 

E[x]  =  tt'x{0). 

D 
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Combining  Tlieorem  1  and  Theorem  2(a)  (and  using  the  fact  that  the  results  hold  for 
any  2;(0)),  we  have  vf  =  ^'[Tr].  The  stationary  distribution  n  is  crucial  in  understanding 
the  formation  of  opinions  since  it  encapsulates  the  weight  given  to  each  agent  (forceful 
or  regular)  in  the  (limiting)  mean  consensus  value  of  the  society.  We  refer  to  the  vector 
7f  as  the  consensus  distribution  corresponding  to  the  mean  interaction  matrix  W  and  its 
component  Tf^  as  the  weight  of  agent  i. 

It  is  also  useful  at  this  point  to  highlight  how  consensus  will  form  around  the  correct 

value  in  the  absence  of  forceful  agents.  Let  {x(/c)}  be  the  belief  sequence  generated  by 

the  belief  update  rule  of  Eq.  (4).  When  there  are  no  forceful  agents,  i.e.  aij  =  0  for  all 

i,j,  then  the  interaction  matrix  W{k)  for  all  k  is  either  equal  to  an  averaging  matrix 

Aij  for  some  i,j  or  equal  to  the  identity  matrix  /;  hence,  W{k)  is  a  doubly  stochastic 

matrix.  This  implies  that  the  average  value  of  x{k)  remains  constant  at  each  iteration, 

i.e., 

1    "  1    " 

-y^  xAk)  = -y^  xAO)       •  for  all  fc  >  0. 

n  ^-^  n  ^-^ 

i=\  1=1 

Theorem  1  therefore  shows  that  when  there  are  no  forceful  agents,  the  sequences  x^{k) 
for  all  f,  converge  to  the  average  of  the  initial  beliefs  with  probability  one,  aggregating 
information.  We  state  this  result  as  a  simple  corollary. 

Corollary  1.  Assume  that  there  are  no  forceful  agents,  i.e.,  Q',j  =  0  for  all  i.j  £  M .  We 
have  ■       -         .  . 

1    "  ■     '  ' 

lira  xAk\  =  —  y    Xi(0)  =  9         with  probability  one. 
k^oo  n  ^-^ 

)=i 

Therefore,  in  the  absence  of  forceful  agents,  the  society  is  able  to  aggregate  informa- 
tion effectively.  Theorem  2  then  also  implies  that  in  this  case  tt  =  tt,  =  1/n  for  all  i  (i.e., 
behefs  converge  to  a  deterministic  value),  so  that  no  individual  has  excess  influence. 
These  results  no  longer  hold  when  there  are  forceful  agents.  In  the  next  section,  we 
investigate  the  effect  of  the  forceful  agents  and  the  structure  of  the  social  network  on 
the  extent  of  misinformation  and  excess  influence  of  individuals. 


4     Global  Limits  on  Misinformation 

In  this  section,  we  are  interested  in  providing  an  upper  bound  on  the  expected  value  of 
the  difference  between  the  consensus  belief  x  (cf.  Theorem  1)  and  the  true  underlying 
state,  Q  (or  equivalently  the  average  of  the  initial  beliefs),  i.e., 

E[x-^]  =  E[x]-'0  =  ^(^,-^)x,,(O),  (13) 

(cf.  Theorem  2).  Our  bound  relies  on  a  fundamental  theorem  from  the  perturbation 
theory  of  finite  Markov  Chains.  Before  presenting  the  theorem,  we  first  introduce  some 
terminology  and  basic  results  related  to  Markov  Chains. 
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4.1     Preliminary  Results 

Consider  a  finite  Markov  Chain  with  n  states  and  transition  probability  matrix  T.'^  We 
say  that  a  finite  Markov  chain  is  regular  if  its  transition  probability  matrix  is  a  primitive 
matrix,  i.e.,  there  exists  some  integer  k  >  0  such  that  all  entries  of  the  power  matrix 
T''  are  positive.  The  following  theorem  states  basic  results  on  the  limiting  behavior  of 
products  of  transition  matrices  of  Markov  Chains  (see  Theorems  4.1.4,  4.1.6,  and  4.3.1 
in  Kemeny  and  Sneh  [26]). 

Theorem  3.  Consider  a  regular  Markov  Chain  with  n  states  and  transition  probability 
matrix  T. 

(a)  The  /c""  power  of  the  transition  matrix  T,  T*^,  converges  to  a  stochastic  matrix  T°° 
with  all  rows  equal  to  the  probability  vector  n,  i.e., 

lim  T''  =  T^  =  eyr',  .    '    -.  ' 

where  e  is  the  n-dimensional  vector  of  all  ones. 

(b)  The  probability  vector  tt  is  a  left  eigenvector  of  the  matrix  T,  i.e., 

Tx'T  =  tt'     and     ir'e  =  1. 
The  vector  tt  is  referred  to  as  the  stationary  distribution  of  the  Markov  Chain. 

(c)  The  matrix  >'  =  {I  —  T  +  T'^)'^  —  T°°  is  well-defined  and  is  given  by 

■DO 

fc=0 

The  matrix  Y  is  referred  to  as  the  fundamental  matrix  of  the  Markov  Chain. 

The  following  theorem  provides  an  exact  perturbation  result  for  the  stationary  dis- 
tribution of  a  regular  Markov  Chain  in  terms  of  its  fundamental  matrix.  The  theorem 
is  based  on  a  result  due  to  Schweitzer  [32]  (see  also  HaA'iv  and  Van  Der  Heyden  [23]). 

Theorem  4.  Consider  a  regular  Markov  Chain  with  n  states  and  transition  probability 
matrix  T .  Let  tt  denote  its  unique  stationary  distribution  and  Y  denote  its  fundamental 
matrix.  Let  D  be  an  77  x  n  perturbation  matrix  such  that  the  sum  of  the  entries  in  each 
row  is  equal  to  0,  i.e., 

rt 

^[D]y  =  0     for  all  i. 


■^^We  use  the  same  nota.tion  a,s  in  (10)  here,  given  the  close  connection  between  the  ma.trices  introduced 
in  the  next  two  theorems  and  the  ones  in  (10). 
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Assume  that  the  perturbed  Markov  chain  with  transition  matrix  T  =  T  +  D  is  regular. 
Then,  the  perturbed  Markov  chain  has  a  unique  stationary  distribution  tt,  and  the  matrix 
/  —  DY  is  nonsingular.  Moreover,  the  change  in  the  stationary  distributions,  p  =  n  —  tt, 
is  given  by 

p'  ^7r'DY{I  -  DY)-\ 


4.2     Main  Results 

This  subsection  provides  bounds  on  the  difference  between  the  consensus  distribution 
and  the  uniform  distribution  using  the  global  properties  of  the  underlying  social  network. 
Our  method  of  analysis  will  rely  on  the  decomposition  of  the  mean  interaction  matrix 
W  given  in  (10)  into  the  social  network  matrix  T  and  the  influence  matrix  D.  Recall 
that  T  is  doubly  stochastic. 

The  next  theorem  provides  our  first  result  on  characterizing  the  extent  of  misinfor- 
mation and  establishes  an  upper  bound  on  the  /cx,-norm  of  the  difference  between  the 
stationary  distribution  n  and  the  uniform  distribution  ^e,  which,  from  Eq.  (13),  also 
provides  a  bound  on  the  deviation  between  expected  beliefs  and  the  true  underlying 
state,  6. 

Theorem  5.      (a)  Let  n  denote  the  consensus  distribution.  The  /oc-norm  of  the  differ- 
ence between  tt  and  ^e  is  given  by 


1 

—  e 
n 


< 


1         E^,JP^Ja^. 


'In 


where  (5  is  a  constant  defined  by 


U.     1  ~  ^'J    ,  ,„     1  ~  ^J' 


and  d  is  the  maximum  shortest  path  length  in  the  graph  {M,£)  [cf.  Eq.  (3)]. 

(b)  Let  X  be  the  limiting  random  variable  of  the  sequences  {i-j(/c)},  i  6  M  generated 
by  Eq.  (4)  (cf.  Theorem  1).  We  have  : 


n  '--^  I       1-0         2n 


1=1 

Proof,   (a)    Recall  that  the  mean  interaction  matrix  can  be  represented  as 

W  =  T  +  D,  • 
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[cf.  Eq.  (10)],  i.e.,  W  can  be  viewed  as  a  perturbation  of  the  social  network  matrix  T 
by  influence  matrix  D.  By  Lemma  10(a),  the  stationary  distribution  of  the  Markov 
chain  with  transition  probability  matrix  T  is  given  by  the  uniform  distribution  -e.  By 
the  definition  of  the  matrix  D  [cf.  Eq.  (9)]  and  the  fact  that  the  matrices  A^.,  and  J^j 
are  stochastic  matrices  with  all  row  sums  equal  to  one  [cf.  Eq.  (5)],  it  follows  that  the 
sum  of  entries  of  each  row  of  D  is  equal  to  0.  Moreover,  by  Theorem  2(a),  the  Markov 
Chain  with  transition  probability  matrix  W  is  regular  and  has  a  stationary  distribution 
7f.  Therefore,  we  can  use  the  exact  perturbation  result  given  in  Theorem  4  to  write  the 
change  in  the  stationary  distributions  ^e  and  n  as 


TT-  -eV  =  -e'DY{I  -  DY)-\ 
n   /        n 


;i4) 


where  Y  is  the  fundamental  matrix  of  the  Markov  Chain  with  transition  probability 
matrix  T,  i.e., 


y  =  ^(T'^-T°°), 


/c=0 


with  T"^  =  -ee'  [cf.  Theorem  3(c)].  Algebraic  manipulation  of  Eq.  (14)  yields 


-e)    -Tf'Dy, 
n 


implying  that 


n 


<  WDY 


;i5) 


where  ||D}'||oo  denotes  the  matrix  norm  induced  by  the  /^o  vector  norm. 

We  next  obtain  an  upper  bound  on  the  matrix  norm  ||D}^||oo-  By  the  definition  of 
the  fundamental  matrix  Y,  we  have 


DY  =  Y^  D{T^  ^T°^)  =  Y^  DT\ 


;i6) 


fc=0 


/c=0 


where  the  second  equality  follows  from  the  fact  that  the  row  sums  of  matrix  D  is  equal 
to  0  and  the  matrix  T°°  is  given  by  T°^  =  -ee'. 

Given  any  2(0)  G  R"  with  ||2(0)||oo  =^  1,  let  {;(/i.')}  denote  the  sequence  generated  by 
the  hnear  update  rule 

z{k)  =  T^z{Q)         for  all  A:  >  0. 


Then,  for  all  fc  >  0,  we  have 


DT^z{{))  =  Dz{k), 


which  by  the  definition  of  the  matrix  D  [cf.  Eq.  (9)]  implies 


(17) 


J  J 


18 


where  the  vector  z''^{k)  G  M"  is  defined  as 

z'^ik)  =  [J^J  -  A,,j]z{k)         for  all  i,  j,  and  k  >0. 

By  the  definition  of  the  matrices  J,j  and  Aij  [cf.  Eq.  (5)],  the  entries  of  the  vector  z^^{k) 
are  given  by 

'i-e^{z^{k)-z,ik))       if/  =  z, 
[z''{k)]i=  {     '    i(,^.(/c)_^^(fc))  iU  =  j,  (18) 

0  otherwise. 


This  impUes  that  the  vector  norm  ||2*"'(/c)||oo  can  be  upper-bounded  by 

for  all  i.j,  and  k  >  0. 


1  r 


l2"-^(^)||oo  <  o    max  zi{k)  -  mmzi{k) 


Defining  M{k)  =  max/g^v"  ~iik)  and  m{k)  =  mini^j^  zi{k)  for  all  k  >  0,  this  implies  that 
\\z'Hk)\U  <  li^'Hk)  -  m(/c))  <  ^(5'-"  (M(0)  -  ?n(0))         for  all  ij,  and  fc  >  0, 

where  the  second  inequality  follows  from  Lemma  10(b)  in  Appendix  C.  Combining  the 
preceding  relation  with  Eq.  (17),  we  obtain 

\\DT'z{Q)\\^  <  ^  lY^P.ja.A  (5'=(Af(0)  -m(0)). 


By  Eq.  (16),  it  follows  that 


OC  OO  -  / 

\DYzm^  <  J2 WDT'zmi^  <  E ^  E^ 

k=0  k=0  \  i.j 


p,ja,j  \5'{M{0)-m.{0))< 


2n(l  ~S)  ' 


where  to  get  the  last  inequality,  we  used  the  fact  that  0  <  5  <  1  and  M(0)  —  m(0)  <  1, 
which  follows  from  ||2(0)||oo  =  1-  Since  2(0)  is  an  arbitrary  vector  with  ||2(0)||oo  =  1, 
this  implies  that  -  .  ' 


\DY\\^  =        min        \\DYz\\^  < 
{^  I  l|z||oo=i} 


Combining  this  bound  with  Eq.  (15),  we  obtain 


1 


2/7.(1  -  6) 


/  ^  Vij^i] 


I J 


TT e 

n 


< 


1      E,.jPuav 


1  -  5         In 


establishing  the  desired  relation, 
(b)    By  Lemma  2(b),  we  have 


E\x\  =  7f'x(0). 
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This  implies  that 


Em  -  - 


'E 


x,(0)    =    fr'xiO)  -  -e'x(O) 


< 


!=1 


TT e 

n 


k(0)||c 


The  result  follows  by  combining  this  relation  with  part  (a). 


D 


Before  providing  the  intuition  for  the  preceding  theorem,  we  provide  a  related  bound 
on  the  /o-norm  of  the  difference  between  yf  and  the  uniform  distribution  -e  in  terms  of 

^  n 

the  second  largest  eigenvalue  of  the  social  network  matrix  T,  and  then  return  to  the 
intuition  for  both  results.        /"~^.  -..r— ■' 

Theorem  6.  Let  tt  denote  the  consensus  distribution  (cf.  Lemma  2).    The  /2-norm  of 
the  difference  between  tt  and  -e  is  given  by 


1 

TT e 

n 


< 


1  E,.jPuQX 


I2  -  1  -  X2{T)  n 

where  X2{T)  is  the  second  largest  eigenvalue  of  the  matrix  T  defined  in  Eq.  (9). 
Proof.  Following  a  similar  argument  as  in  the  proof  of  Theorem  5,  we  obtain 


1 

IT e 

n 


<  WDY 


(19) 


where  |[i?i''||2  is  the  matrix  norm  induced  by  the  I2  vector  norm.  To  obtain  an  upper 
bound  on  the  matrix  norm  ||Z^y'||2,  we  consider  an  initial  vector  2(0)  6  R"  with  ||2(0)||2  = 
1  and  the  sequence  generated  by 


ik  +  l)  =  Tz{k}         foralU->0. 


Then,  for  all  A:  >  0,  we  have 


DT'ziO) 


n 


Z^P^J^^J 


'{k), 


(20) 


where  the  entries  of  the  vector  z^^  (k)  are  given  by  Eq.  (IS).  We  can  provide  an  upper 
bound  on  the  i|~'-'(/v")|li  as 

\\~''{k)\\l  =  \{^Ak)  -  ^rm'  =  \  {{^lik)  -  5)  +  {z  -  z^{k)))\ 

where  z  =  -Y17=i  ~iik)  ^i'  f^^I  ^'  (i^ote  that  since  T  is  a  doubly  stochastic  matrix,  the 
average  of  the  entries  of  the  vector  z{k)  is  the  same  for  all  k).  Using  the  relation 
(a  +  b)~  <  2{a^  +  b~)  for  any  scalars  a  and  6,  this  yields 


\k)\\l<Y.^z,[k)--zf  =  \\z{k)--ze\\l 
20 


(21) 


;=i 


We  have 

zik  +  1)  -ze  =  Tz{k)  -  ze  =  T(z{k)  -  ze\ 

where  the  second  equahty  follows  from  the  stochasticity  of  the  matrix  T,  implying  that 
Te  =  e.  The  vector  z{k)  —  ze  is  orthogonal  to  the  vector  e,  which  is  the  eigenvector 
corresponding  to  the  largest  eigenvalue  Aj  =  1  of  matrix  T  (note  that  Ai  =  1  since  T 
is  a  primitive  and  stochastic  matrix).  Hence,  using  the  variational  characterization  of 
eigenvalues,  we  obtain 

\\z{k  +  1)  -  ze\\l  <  {z{k)  -  -zeyr-  {z{k)  -  ze)  <  \2{Tnz[k)  -  zeg. 

where  A2(T)  is  the  second  largest  eigenvalue  of  matrix  T,  which  implies 

\\z{k)  -  ze\\l  <  (a2(T)2)  VnO)  -  ze\\l  <  X.^iTf': 

Here  the  second  inequality  follows  form  the  fact  that  |lc(0)|l2  =  1  and  z  is  the  average 
of  the  entries  of  vector  2(0).  Combining  the  preceding  relation  with  Eq.  (21),  we  obtain 

\\z''ik)\\2  <  \2[Tf         for  all  k  >  0. 

By  Eq.  (20),  this  implies  that  -   ^  /     ' 

\\DT'z{0)\\2  -  ^(^p.ja,j)A2(r)''-         for  all  k  >  0, 

;■■   ,  ''''■■   ■..-, ■•'■         ''J  ■      ■:.'! "  •,   ■•.,■" 

Using  the  definition  of  the  fundamental  matrix  1',  we  obtain  •       ^. 

oo  oo      ..  ..  v-^ 

||Dr.-(o)tb  <  E  W^THom  <  E^(Ep.".)A.(r)'-  =  ^^  ^^^^. 

fe=0  fc=0  I,]  '  ^\     I 

for  any  vector  2(0)  with  ||2(0)||2  =  1.  Combined  with  Eq.  (19),  this  yields  the  desired 
result.  ■  ■        D 

Theorem  G  characterizes  the  variation  of  the  stationary  distribution  in  terms  of  the 
average  influence,  — ''■'  '^  '"' ,  and  the  second  largest  eigenvalue  of  the  social  network 
matrix  7",  A2(T').  As  is  well  known,  the  difference  1  —  A2(r),  also  referred  to  as  the 
spectral  gap,  governs  the  rate  of  convergence  of  the  Markov  Chain  induced  by  the  social 
network  matrix  T  to  its  stationary  distribution  (see  [10]).  In  particular,  the  larger 
1  —  A2(T)  is,  the  faster  the  /c""  power  of  the  transition  probability  matrix  converges  to 
the  stationary  distribution  matrix  (cf.  Theorem  3).  When  the  Markov  chain  converges 
to  its  stationary  distribution  rapidly,  we  say  that  the  Markov  chain  is  fast-mixing}'' 


''We  use  the  terms  ''spectral  gap  of  the  Markov  chain"  and  "spectral  gap  of  tlie  (induced)  graph", 
and  '"fast-mixing  Markov  chain"  and  "fast-mixing  graph"  interchangeably  in  the  sequel. 
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In  this  light,  Theorem  6  shows  that,  in  a  fast-mixing  graph,  given  a  fixed  average  influ- 
ence ^''■'  '^  '^ ,  the  consensus  distribution  is  "closer"  to  the  underlying  9  =  -  X^"=i  Xi{0) 
and  the  extent  of  misinformation  is  limited.  This  is  intuitive.  In  a  fast-mixing  social 
network  graph,  there  are  several  connections  between  any  pair  of  agents.  Now  for  any 
forceful  agent,  consider  the  set  of  agents  who  will  have  some  influence  on  his  beliefs.  This 
set  itself  is  connected  to  the  rest  of  the  agents  and  thus  obtains  information  from  the 
rest  of  the  society.  Therefore,  in  a  fast-mixing  graph  (or  in  a  society  represented  by  such 
a  graph),  the  beliefs  of  forceful  agents  wih  themselves  be  moderated  by  the  rest  of  the 
society  before  they  spread  widely.  In  contrast,  in  a  slowly-mixing  graph,  we  can  have  a 
high  degree  of  clustering  around  forceful  agents,  so  that  forceful  agents  get  their  (already 
limited)  information  intake  mostly  from  the  same  agents  that  they  have  influenced.  If 
so,  there  will  be  only  a  very  indirect  connection  from  the  rest  of  the  society  to  the  beliefs 
of  forceful  agents  and  forceful  agents  will  spread  their  information  widely  before  their 
opinions  also  adjust.  As  a  result,  the  consensus  is  more  likely  to  be  much  closer  to  the 
opinions  of  forceful  agents,  potentially  quite  different  from  the  true  underlying  state  9. 

This  discussion  also  gives  intuition  for  Theorem  5  since  the  constant  5  in  that  result 
is  closely  linked  to  the  mixing  properties  of  the  social  network  matrix  and  the  social 
network  graph.  In  particular,  Theorem  5  clarifies  that  S  is  related  to  the  maximum 
shortest  pat-h  and  the  minimum  probability  of  (indirect)  communication  between  any 
two  agents  in  the  society.  These  two  notions  also  crucially  influence  the  spectral  gap 
1  —  A2(T'„),  which  plays  the  key  role  in  Theorem  6. 

These  intuitions  are  illustrated  in  the  next  example,  which  shows  how  in  a  certain 
class  of  graphs,  misinformation  becomes  arbitrarily  small  as  the  social  network  grows. 

Example  1.  (Expander  Graphs)  Consider  a  sequence  of  social  network  graphs  Qri  — 
{Mm  An)  induced  by  symmetric  n  x  n  matrices  r„  [cf.  Eq.  (11)].  Assume  that  this 
sequence  of  graphs  is  a  family  of  expander  graphs,  i.e.,  there  exists  a  positive  constant 
7  >  0  such  that  the  spectral  gap  1  —  X2{Tn)  of  the  graph  is  uniformly  bounded  away 
from  0,  independent  of  the  number  of  nodes  n  in  the  graph,  i.e., 

7  <  1  —  A2(T'„)         for  all  n, 

(see  [13])  As  an  example,  Internet  has  been  shown  to  be  an  expander  graph  under  the 
preferential  connectivity  random  graph  model  (see  [27]  and  [24]).  Expander  graphs  have 
high  connectivity  properties  and  are  fast  mixing. 

We  consider  the  following  influence  structure  superimposed  on  the  social  network 
graph  Qr,.  We  define  an  agent  j  to  be  locally  forceful  if  he  influences  a  constant  number  of 
agents  in  the  society,  i.e.,  his  total  influence,  given  by  ^^  PijCtij,  is  a  constant  independent 
of  n.  We  assume  that  there  is  a  constant  number  of  locally  forceful  agents.  Let  7f„  denote 
the  stationary  distribution  of  the  Markov  Chain  with  transition  probability  matrix  given 
by  the  mean  interaction  matrix  W  [cf.  Eq.  (8)].  Then,  it  follows  from  Theorem  6  that 

7f„ e      — >  0     as     n  — >  oo. 

n     2 
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(a) 


(b) 


Figure  1:  Impact  of  location  of  forceful  agents  on  the  stationary  distribution  (a)  Misin- 
formation over  the  bottleneck  (b)  Misinformation  inside  a  cluster 

This  shows  that  if  the  social  network  graph  is  fast-mixing  and  there  is  a  constant  number 
of  locally  forceful  agents,  then  the  difference  between  the  consensus  belief  and  the  average 
of  the  initial  beliefs  vanishes.  Intuitively,  in  expander  graphs,  as  n  grows  large,  the  set  of 
individuals  who  are  the  source  of  information  of  forceful  agents  become  highly  connected, 
and  thus  rapidly  inherit  the  average  of  the  information  of  the  rest  of  the  society.  Provided 
that  the  number  of  forceful  agents  and  the  impact  of  each  forceful  agent  do  not  grow 
with  n,  then  their  influence  becomes  arbitrarily  small  as  n  increases. 

5      Connectivity  of  Forceful  Agents  and  Misinforma- 
tion 

The  results  provided  so  far  exploit  the  decomposition  of  the  evolution  of  beliefs  into 
the  social  network  component  (matrix  T)  and  the  influence  component  (matrix  D). 
This  decomposition  does  not  exploit  the  interactions  between  the  structure  of  the  social 
network  and  the  location  of  forceful  agents  within  it.  For  example,  forceful  agents  located 
in  different  parts  of  the  same  social  network  will  have  different  impacts  on  the  extent 
of  misinformation  in  the  society,  but  our  results  so  far  do  not  capture  this  aspect.  The 
following  example  illustrates  these  issues  in  a  sharp  way. 

Example  2.  Consider  a  society  consisting  of  six  agents  and  represented  by  the  (undi- 
rected) social  network  graph  shown  in  Figure  1.  The  weight  of  each  edge  {i,j}  is  given 
by 

where,  for  illustration,  we  choose  p,j  to  be  inversely  proportional  to  the  degree  of  node 
i,  for  all  j.  The  self-loops  are  not  shown  in  Figure  1. 

We  distinguish  two  different  cases  as  illustrated  in  Figure  1.   In  each  case,  there  is 
a  single  forceful  agent  and  a  =   1/2.    This  is  represented  by  a  directed  forceful  link. 
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The  two  cases  differ  by  the  location  of  the  forceful  link,  i.e.,  the  forceful  link  is  over  the 
bottleneck  of  the  connectivity  graph  in  part  (a)  and  inside  the  left  cluster  in  part  (b). 
The  corresponding  consensus  distributions  can  be  computed  as 

^,  =  p(1.25, 1.25,  1.25,0.75,0.75,0.75)',      ^^  = -(0.82, 1.18,  1, 1, 1, 1)'. 
6  6 

Even  though  the  social  network  matrix  T  (and  the  corresponding  graph)  is  the  same  in 
both  cases,  the  consensus  distributions  are  different.  In  particular,  in  part  (a),  each  agent 
in  the  left  cluster  has  a  higher  weight  compared  to  the  agents  in  the  right  cluster,  while 
in  part  (b),  the  weight  of  all  agents,  except  for  the  forceful  and  influenced  agents,  are 
equal  and  given  by  1/6.  This  is  intuitive  since  when  the  forceful  link  is  over  a  bottleneck, 
the  misinformation  of  a  forceful  agent  can  spread  and  influence  a  larger  portion  of  the 
society  before  his  opinions  can  be  moderated  by  the  opinions  of  the  other  agents. 

This  example  shows  how  the  extent  of  spread  of  misinformation  varies  depending 
on  the  location  of  the  forceful  agent.  The  rest  of  this  section  provides  a  more  detailed 
analysis  of  how  the  location  and  connectivity  of  forceful  agents  affect  the  formation  of 
opinions  in  the  network.  We  proceed  as  follows.  First,  we  provide  an  alternative  exact 
characterization  of  excess  influence  using  mean  first  passage  times.  We  then  introduce 
the  concept  of  essential  edges,  similar  to  the  situation  depicted  in  Example  2,  and  provide 
sharper  exact  results  for  graphs  in  which  forceful  links  coincide  with  essential  edges.  We 
then  generalize  these  notions  to  more  general  networks  by  introducing  the  concept  of 
information  bottlenecks,  and  finally,  we  develop  new  techniques  for  determining  tighter 
upper  bounds  on  excess  influence  by  using  ideas  from  graph  clustering. 

5.1      Characterization  in  Terms  of  Mean  First  Passage  Times 

Our  next  main  result  provides  an  exact  characterization  of  the  excess  influence  of  agent 
i  in  terms  of  the  mean  passage  times  of  the  Markov  chain  with  transition  probability 
matrix  T.  This  result,  and  those  that  follow  later  in  this  section,  will  be  useful  both  to 
provide  more  informati\'e  bounds  on  the  extent  of  misinformation  and  also  to  highlight 
the  sources  of  excess  influence  for  certain  agents  in  the  society. 

We  start  with  presenting  some  basic  definitions  and  relations  (see  Chapter  2  of  [2]). 

Definition  1.  Let  {Xt,t  =  0,  1,  2, .  .  .)  denote  a  discrete-time  Markov  chain.  We  denote 
the  first  hitting  time  of  state  i  by 

T,  -  inf  {/  >  0  I  A',  =  i}. 

We  define  the  mean  first  passage  time  from  state  i  to  state  j  as 

m,j  =  E[Tj  I  A'o  =  •/], 

and  the  mean  commute  time  between  state  i  and  state  j  as  777.,^  +  7r7j,.  Moreover,  we 
define  the  7?7ea77  first  return  time  to  a  particular  state  i  as 

777+=E[T+    \Xo  =  l], 
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where 


T+  =  mf{t  >l\Xt  =  i]. 


Lemma  1.  Consider  a  Markov  chain  with  transition  matrix  Z  and  stationary  distribu- 
tion TF.  We  have: 

(i)  The  mean  first  return  time  from  state  i  to  i  is  given  by  mf  =  l/7r,. 

(ii)  The  mean  first  passage  time  from  i  to  j  is  given  by 


rriij  = 


Y-  -Y 


where  Y  =  X^^o(^'^  ~  Z^)  is  the  fundamental  matrix  of  the  Markov  chain. 

We  use  the  relations  in  the  preceding  lemma  between  the  fundamental  matrix  of 
a  Markov  chain  and  the  mean  first  passage  times  between  states,  to  provide  an  exact 
characterization  of  the  excess  influence  of  agent  k. 

Theorem  7.  Let  n  denote  the  consensus  distribution.  We  have: 

(a)  For  every  agent  k 

1  1 


TTfc 


n 


'J 


PyQ',j  ( (1  -  2e)7r,  +  TTj  ]  {rrii 


ik  —  'nijk, 


(b)  Let  Ai  denote  the  set  of  edges  over  which  there  is  a  forceful  link,  i.e., 

-41  ==  |{i,j}  G  ^  i  cvjj  >  0  or  ttj,  >  o|. 

Assume  that  for  any  {?',j},  [kj]  G  Aj,  we  have  {?',  j}  fl  {kj]  =  0.  Then, 

p,jQ',j(l  -  e) 


TTfc 


n       n^  ^    1  —  C,,7'n 
1 ,  j  ■" 


\m^k  -  m^ik, 


where 


vj      '^jt 


+  ejPijQ'jj  -  -Pjiftj, 


1  1^        \ 


rriH, 


(22) 


and  m,j  is  the  mean  first  passage  time  from  state  i  to  state  j  of  a  Markov  chain  with 
transition  matrix  given  by  the  social  network  matrix  T  [cf.  Eq.  (9)j. 
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Proof,   (a)    Following  the  same  line  of  argument  as  in  the  proof  of  Theorem  5,  we  can 
use  the  perturbation  results  of  Theorem  '1  to  write  the  excess  influence  of  agent  k  as 

Tffc  -  -  =  TT'D[}f,  (23) 

n 
where  V  is  the  fundamental  matrix  of  a  Markov  chain  with  transition  matrix  T.  Using 
(5),  and  the  definition  of  D  in  (9)  we  have 


D\Y] 


E^  i(y,,-K,,)  .f/  =  ;, 


I J  I   ■    ,  0  otherwise. 

Hence,  we  can  write  right-hand  side  of  Ecj.  (23)  as  follows: 


By  Lemma  l(ii),  we  have 

Yjk-yik  = -{mik-m.jk),  ■  (25) 

where  V  is  the  fundamental  matrix  of  the  Markov  chain  with  transition  matrix  T.  The 
desired  result  follows  by  substituting  the  preceding  relation  in  Eq.  (24). 

(b)  In  view  of  the  assumption  that  all  edges  in  Ai  are  pairwise  disjoint,  the  perturbation 
matrix  D  decomposes  into  disjoint  blocks,  i.e., 

D=     Yl     Aj  +  /^,n      where  A,  =  ^^  K  -  A„] .  (26) 

{'j}e-4x 

For  each  edge  {i,j}  G  Ai,  it  is  straightforward  to  show  that 

;A,  +  i^,oy)'  =  (i-^)(A,  +  i?,,:)r.  , 

Using  the  decomposition  in  Eq.  (26)  and  the  preceding  relation,  it  can  be  seen  that 


DY{I-DYr^  =  J2[^-f,)     A,l 


I J 


Combined  with  the  exact  perturbation  result  in  Theorem  4,  this  implies  that 

^fc--    -    ~[e'DY{I-DY)-% 
n  n 


\T.{^-%y\'D.,yv^ 


n  '■ — '  V        n- 


P-J^U 


1  -e 


^    \'-Cjn^-   (^^'^^-^'^-)- 


"■■•J 

The  main  result  follows  by  substituting  Eq.  (2,5)  in  the  above  equation.  D 
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Part  (a)  of  Theorem  7  provides  an  exact  expression  for  the  excess  influence  of  agent 
k  as  a  function  of  the  mean  first  passage  times  from  agent  (or  state)  k  to  the  forceful  and 
influenced  agents.  The  excess  influence  of  each  agent  therefore  depends  on  the  relative 
distance  of  that  agent  to  the  forceful  and  the  influenced  agent.  To  provide  an  intuition 
for  this  result,  let  us  consider  the  special  case  in  which  there  is  a  single  forceful  link 
{j,i)  in  the  society  (i.e.,  only  one  pair  of  agents  i  and  j  with  ckj^  >  0)  and  thus  a  single 
forceful  agent  j.  Then  for  any  agent  k,  their  only  source  of  excess  influence  can  come 
from  their  (potentially  indirect)  impact  on  the  beliefs  of  the  forceful  agent  j.  This  is  why 
rrijk,  which,  loosely  speaking,  measures  the  distance  between  j  and  fc,  enters  negatively 
in  to  the  expression  for  the  excess  influence  of  agent  k.  In  addition,  any  agent  who 
meets  (communicates)  with  agent  i  with  a  high  probability  will  be  indirectly  influenced 
by  the  opinions  of  the  forceful  agent  j.  Therefore,  the  excess  influence  of  agent  k  is 
increasing  in  his  distance  to  i,  thus  in  rriik-  In  particular,  in  the  extreme  case  where 
rriik  is  small,  agent  k  will  have  negative  excess  influence  (because  he  is  very  close  to  the 
heavily  "influenced"  agent  i)  and  in  the  polar  extreme,  where  rrijk  is  small,  he  will  have 
positive  excessive  influence  (because  his  views  will  be  quickly  heard  by  the  forceful  agent 
j).  The  general  expression  in  part  (a)  of  the  theorem  simply  generalizes  this  reasoning 
to  general  social  networks  with  multiple  forceful  agents  and  several  forceful  links. 

Part  (b)  provides  an  alternative  expression  [cf.  Eq.  (22)],  with  a  similar  intuition 
for  the  special  case  in  which  all  forceful  links  are  disjoint.  The  main  advantage  of  the 
expression  in  part  (b)  is  that,  though  more  complicated,  is  not  in  terms  of  the  expected 
consensus  distribution  ft  (which  is  endogenous).  Disjoint  forceful  link  property  in  part 
(b)  is  also  useful  because  it  enables  us  to  isolate  the  effects  of  the  forceful  agents.  The 
parameter  (,*,j  in  Eq.  (22)  captures  the  asymmetry  between  the  locations  of  agents  i  and 
j  in  the  underlying  social  network  graph.  Although  the  expression  for  excess  influence 
in  part  (a)  of  Theorem  7  is  a  function  of  the  consensus  distribution  yf,  each  element  of 
this  vector  (distribution)  can  be  bounded  by  1  to  obtain  an  upper  bound  for  the  excess 
influence  of  agent  k. 

Using  the  results  in  Theorem  7,  the  difference  between  the  consensus  distributions 
discussed  in  Example  2  can  be  explained  as  foflows.  In  Example  2(a),  the  mean  flrst 
passage  time  from  agent  4  to  any  agent  k  in  the  left  cluster  is  strictly  larger  than  that 
of  agent  3  to  agent  k,  because  every  path  from  agent  4  to  the  left  cluster  should  pass 
through  agent  3.  Therefore,  rriik  >  i^^k  for  k  =  1,  2,  3,  and  agents  in  the  left  cluster  have 
a  higher  consensus  weight.  In  Example  2(b),  due  to  the  symmetry  of  the  social  network 
graph,  the  mean  first  passage  times  of  agents  1  and  2  to  any  agent  k  f^  1,2  are  the  same, 
hence  establishing  by  Theorem  7  the  uniform  weights  in  the  consensus  distribution. 

In  the  following  we  study  the  effect  of  the  location  of  a  forceful  link  on  the  excess 
influence  of  each  agent  by  characterizing  the  relative  mean  first  passage  time  Im^k  —  mjkl, 
in  terms  of  the  properties  of  the  social  network  graph. 


27 


5.2     Forceful  Essential  Edges  ,     , 

In  this  subsection,  we  provide  an  exact  characterization  of  the  excess  influence  of  agent  k 
expUcitl}'  in  terms  of  the  properties  of  the  social  network  graph.  We  focus  on  the  special 
case  when  the  undirected  edge  between  the  forceful  and  the  influenced  agent  is  essential 
for  the  social  network  graph,  in  the  sense  that  without  this  edge  the  graph  would  be 
disconnected.  We  refer  to  such  edges  as  forceful  essential  edges.  Graphs  with  forceful 
essential  edges  approximate  situations  in  which  a  forceful  agent,  for  example  a  media 
outlet  or  political  leader,  itself  obtains  all  of  its  information  from  a  tightknit  community. 
We  first  give  the  definition  of  an  essential  edge  of  an  undirected  graph. 

Definition  2.  Let  Q  =  {J\f,A)  be  an  undirected  graph.  An  edge  {i.j}  &  A  is  an 
essential  edge  of  the  graph  Q  =  (A^,  A)  if  its  removal  would  partition  the  set  of  nodes 
into  two  disjoint  sets  N{i,j)  C  A^"  with  i  G  N{i,j),  and  N{j,2)  C  .V  with  j  £  N{j,i). 

The  following  lemma  provides  an  exact  characterization  of  the  mean  first  passage 
time  from  state  ?  to  state  j,  where  i  and  j  are  the  end  nodes  of  an  essential  edge  {i,j}. 

Lemma  2.  Consider  a  Markov  chain  with  a  doubly  stochastic  transition  probability 
matrix  T.  Let  {i,j}  be  an  essential  edge  of  the  social  network  graph  induced  by  matrix 
T. 

(a)  We  have 

l^'(^J)l 


■■•IJ  rp  ■ 

''J  ■ 

(b)   For  every  k  €  N[j,i), 

m^k  -  m^k  =  rn.iy 

Proof.  Consider  a  Markov  chain  over  the  set  of  states  A/"'  =  N{i,j)[j{j},  with  transition 
probabilities 

%  =  T,,.      for  all  k^l. 

For  the  new  chain  with  stationary  distribution  rr  we  have 

T  T 

^    ^     'J  —  'J 


'       T       \N{i,j)\  +  T,/ 

where  T  is  the  total  edge  weight  in  the  new  chain. 

Since  {i.j}  is  essential,  every  path  from  i  to  j  should  pass  through  {?,  j}.  Moreover, 
because  of  equivalent  transition  probabilities  between  the  new  Markov  chain  and  the 
original  one  on  A^',  the  mean  passage  time  mjj  of  the  original  Markov  chain  is  equal  to 
mean  passage  time  m,j  of  the  new  chain.  On  the  other  hand,  for  the  new  chain,  we  can 
write  the  mean  return  time  to  j  as 

rh^  =  1  +  in,  J  =  1  +  m^j , 
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which  implies  [cf.  Lemma  l(i)] 


1       ^^|A^(^J)I 


The  second  part  of  the  claim  follows  from  the  fact  that  all  of  the  paths  from  i  to  k 
must  pass  through  {?,  j},  because  it  is  the  only  edge  connecting  N{i,j)  to  N{j,i).  Thus, 
we  conclude 

"T-ifc  =  rn,j  +  rrijk. 

D 

We  use  the  relation  in  Lemma  2  to  study  the  effect  of  a  single  forceful  link  over  an 
essential  edge  on  the  excess  influence  of  each  agent. 

Theorem  8.  Let  fr  denote  the  consensus  distribution,  Assume  that  there  exists  a  single 
pair  {i,j}  for  which  the  influence  probabihty  Oij  >  0.  Assume  that  the  edge  {i,j}  is  an 
essential  edge  of  the  social  network  graph.  Then,  we  have  for  all  k, 

-        1        2  ^..(1-^) ^    .,. 


where 


and 


l-'ii[{l  +  2e)\N{i.j)\-\N{j,r)\) 

Pij  OLij 


■^,,{k) 


P»j(l  -7tj)  +Pjt(l  -InY 


\N{i..])l       keN{j,i), 
-|A^(j,Oi,    keNii,j). 


Proof.  Since  edge  {i,j}  is  essential,  by  Lemma  2  we  have  for  every  k  E  N(j,i) 

\N{i,j)\                   2n\N{r,j)\ 
m,k  -  rrijk  =  rn,.j  =  — =  — — -, 

Similarly,  for  every  k  G  N{i,j),  we  obtain  ■ 

■      .       .     ■      ■  2n\N{j,i)\       ■  ' 

Combining  the  preceding  relations,  we  can  write  for  the  relative  mean  passage  time 
TTiik  —  TTT-jk  =  '^  \l/;j(/c).  Sincc  {i,j)  is  the  only  forceful  link,  we  can  apply  Theorem 
7(b)  to  get  -  .  ,  ' 

where  Ctj  is  given  by  ' 

Combining  the  above  relations  with  Lemma  2(i)  establishes  the  desired  result.  D 
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Theorem  S  shows  that  if  two  clusters  of  agents,  e.g.,  two  communities,  are  connected 
via  an  essential  edge  over  which  there  is  a  forceful  link,  then  the  excess  influence  of 
all  agents  within  the  same  cluster  are  equal  (even  when  the  cluster  does  not  have  any 
symmetry  properties).  This  implies  that  the  opinions  of  all  agents  that  are  in  the  same 
cluster  as  the  forceful  agent  affect  the  consensus  opinion  of  the  society  with  the  same 
strength.  This  property  is  observed  in  part  (a)  of  Example  2,  in  which  edge  {3,4}  is 
an  essential  edge.  Intuitively,  all  of  the  agents  in  that  cluster  will  ultimately  shape 
the  opinions  of  the  forceful  agent  and  this  is  the  source  of  their  excess  influence.  The 
interesting  and  surprising  feature  is  that  they  all  have  the  same  excess  influence,  even  if 
only  some  of  them  are  directly  connected  to  the  forceful  agent.  Loosel}'  speaking,  this 
can  be  explained  using  the  fact  that,  in  the  limiting  distribution,  it  is  the  consensus 
among  this  cluster  of  agents  that  will  impact  the  beliefs  of  the  forceful  agent,  and  since 
within  this  cluster  there  are  no  other  forceful  agents,  the  consensus  value  among  them 
puts  equal  weight  on  each  of  them  (recall  Corollary  1). 

5.3     Information  Bottlenecks 

We  now  extend  the  ideas  in  Theorem  8  to  more  general  societies.  We  observed  in 
Example  2  and  Section  5.2  that  influence  over  an  essential  edge  can  have  global  effects 
on  the  consensus  distribution  since  essential  edges  are  "bottlenecks"  of  the  information 
flow  in  the  network.  In  this  subsection  we  generalize  this  idea  to  influential  links  over 
bottlenecks  that  are  not  necessarily  essential  edges  as  defined  in  Definition  2.  Our  goal 
is  to  study  the  impact  of  influential  links  over  bottlenecks  on  the  consensus  distribution. 
To  achieve  this  goal,  we  return  to  the  characterization  in  Theorem  7,  which  was  in 
terms  of  first  mean  passage  times,  and  then  provide  a  series  of  (successively  tighter) 
upper  bounds  on  the  key  term  (m,/,-  —  rrijk)  in  Eq.  (22)  in  this  theorem.  Our  first 
bound  on  this  object  will  be  in  terms  of  the  minimum  normalized  cut  of  a  Markov  chain 
(induced  by  an  undirected  weighted  graph),  which  is  introduced  in  the  next  definition. 
W^e  will  use  the  term  cut.  of  a  Markov  Chain  (or  cut  of  an  undirected  graph)  to  denote 
a  partition  of  the  set  of  states  of  a  Markov  chain  (or  equivalent ly  the  nodes  of  the 
corresponding  graph)  into  two  sets. 

Definition  3.  Consider  a  Markov  chain  with  set  of  states  A/",  symmetric  transition 
probability  matrix  Z,  and  stationary  distribution  it.  The  minimum  normalized  cut 
value  (or  conductance)  of  the  Markov  chain,  denoted  by  p,  is  defined  as 

.        Q{S,S'=) 
P-mf^— — — —-,  20 

Sc.^  7r(b)lT{b'^) 

where  Q(-4.  B)  =  JZieAj^B  ^i^u'  ^^'^  '^{S)  =  Z!ie5  ^^  ^^''^  ''^f^^'  ^o  ^^^^  '^'-•^  that  achieves 
the  minimum  in  this  optimization  problem  as  the  minimum  normalized  cut. 

The  objective  in  the  optimization  problem  in  (27)  is  the  (normalized)  conditional 
probability  that  the  Markov  chain  makes  a  transition  from  a  state  in  set  5  to  a  state 
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in  set  S'^  given  that  tlie  initial  state  is  in  S.  The  minimum  normahzed  cut  therefore 
characterizes  how  fast  the  Markov  chain  will  escape  from  any  part  of  the  state  space, 
hence  is  an  appropriate  measure  of  information  bottlenecks  or  the  mixing  time  of  the 
underlying  graph.  Clearly,  the  minimum  normalized  cut  value  is  larger  in  more  connected 
graphs. 

The  next  lemma  provides  a  relation  between  the  maximum  mean  commute  time  of 
a  Markov  chain  (induced  by  an  undirected  graph)  and  the  minimum  normalized  cut  of 
the  chain,  which  is  presented  in  Section  5.3  of  Aldous  and  Fill  [2].  This  result  will  then 
be  used  in  the  next  theorem  to  provide  an  improved  bound  on  the  excess  influences  by 
using  the  fact  that  \mik  —  mjk\  <  maxjj  {mjj,  mj,}  (see,  in  particular,  proof  of  Theorem 
9). 

Lemma  3.  Consider  an  n-state  Markov  chain  with  transition  matrix  Z  and  stationary 
distribution  it.  Let  p  denote  the  minimum  normalized  cut  value  of  the  Markov  chain 
(cf.  Definition  3).  The  maximum  mean  commute  time  satisfies  the  following  relation: 

4(1  + log  n)  ,     , 

max{m„'  +  m,-,}  <  -^ ^-^.  28) 

i,j  pmmTTk 

We  use  the  preceding  relation  together  with  our  characterization  of  excess  influence 
in  terms  of  mean  first  passage  times  in  Theorem  7  to  obtain  a  tighter  upper  bound  on  the 
loo  norm  of  excess  influence  than  in  Theorem  5.  This  result,  which  is  stated  next,  both 
gives  a  more  directly  interpretable  limit  on  the  extent  of  misinformation  in  the  society 
and  also  shows  the  usefulness  of  the  characterization  in  terms  of  mean  flrst  passage  times 
in  Theorem  7. 

Theorem  9.  Let  n  denote  the  consensus  distribution.  Then,  we  have 


1 

TV e 

n 


2p,ja,j  /I  +  logn 


P 


where  p  is  the  minimum  normalized  cut  value  of  the  Markov  chain  with  transition 
probability  matrix  given  by  the  social  network  matrix  T  (cf.  Definition  3). 

Proof.  By  Theorem  7  we  have  for  every  /c  ..  _  ■       •     , 
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where  (29)  holds  because  m,fc  <  mjj  +mjk,  and  m^fc  <  rriji  +  rnik-,  and  the  last  mequality 
follows  from  Eq.  (28),  and  the  fact  that  tt  =  ^e.  D 

One  advantage  of  the  result  in  this  theorem  is  that  the  bound  is  in  terms  of  p,  the 
minimum  normahzed  cut  of  the  social  network  graph.  As  emphasized  in  Definition  3, 
this  notion  is  related  to  the  strength  of  (indirect)  communication  links  in  the  society. 
Although  the  bound  in  Theorem  9  is  tighter  than  the  one  we  provided  in  Theorem  5,  it 
still  leaves  some  local  information  unexploited  because  it  focuses  on  the  maximum  mean 
commute  times  between  all  states  of  a  Markov  chain.  The  following  example  shows  how 
this  bound  may  be  improved  further  by  focusing  on  the  mean  commute  time  between 
the  forceful  and  the  influenced  agents. 

Example  3.  (Barbell  graph)  The  barbell  graph  consists  of  two  complete  graphs  each 
with  n.]  nodes  that  are  connected  via  a  path  that  consists  of  n2  nodes  (cf.  Figure  2). 
Consider  the  asymptotic  behavior 

n  — +  oo,      n-[/n  — >  v,      n2/n  —*  \  —  2u, 

where  n  =  2n\  +  n2  denotes  the  total  number  of  nodes  in  the  barbell  graph,  and  0  < 
u  <  \.  The  mean  first  passage  time  from  a  particular  node  in  the  left  bell  to  a  node  in 
the  right  bell  is  0{n^)  as  n  — >  oo,  while  the  mea,n  passage  time  between  any  two  nodes 
in  each  bell  is  0[n)  (See  Chapter  5  of  [2]  for  exact  results).  Consider  a  situation  where 
there  is  a  single  forceful  link  in  the  left  bell. 

The  minimum  normalized  cut  for  this  example  is  given  by  cut  Co,  with  normalized 
cut  value  0(l/r!),  which  captures  the  bottleneck  in  the  global  network  structure.  Since 
the  only  forceful  agent  is  within  the  left  bell  in  this  example,  we  expect  the  flow  of 
information  to  be  limited  by  cuts  that  separate  the  forceful  and  the  influenced  agent, 
and  partition  the  left  bell.  Since  the  left  bell  is  a  complete  graph,  the  cuts  associated 
with  this  part  of  the  graph  will  have  higher  normalized  cut  values,  thus  yielding  tighter 
bounds  on  the  excess  influence  of  the  agents.  In  what  follows,  we  consider  bounds  in 
terms  of  ''relative  cuts"  in  the  social  network  graph  that  separate  forceful  and  influenced 
agents  in  order  to  capture  bottlenecks  in  the  spread  of  misinformation  (for  example,  cuts 
Ci,  C2,  and  C3  in  Figure  2). 

5.4     Relative  Cuts 

The  objective  of  this  section  is  to  improve  our  characterization  of  the  extent  of  misin- 
formation in  terms  of  information  bottlenecks.  To  achieve  this  objective,  we  introduce 
a  new  concept,  relative  cuts,  and  then  show  how  this  new  concept  is  useful  to  derive  im- 
proved upper  bounds  on  the  excess  influence  of  different  individuals  and  on  the  extent  of 
misinformation.  Our  strategy  is  to  develop  tighter  bounds  on  the  mean  commute  times 
between  the  forceful  and  influenced  agents  in  terms  of  relative  cut  values.  Together  with 
Theorem  7,  this  enables  us  to  provide  bounds  on  the  excess  influence  as  a  function  of 
the  properties  of  the  social  network  graph  and  the  location  of  the  forceful  agents  within 
it. 
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Figure  2:  The  barbell  graf)h  with  ni  =  8  nodes  in  each  bell  and  rio  =  4.  There  is  a  single 
forceful  link,  represented  by  a  directed  link  in  the  left  bell. 


Definition  4.  Let  Q  =  (N',A)  be  an  undirected  graph  with  edge  {?.  j}  weight  given  by 
w^j.  The  minimum  relative  cut  value  between  a  and  b,  denoted  by  Cab,  is  defined  as 


Cab  =  inf 


S  cM,aeS,b^S}. 


We  refer  to  the  cut  that  achieves  the  minimum  in  this  optimization  problem  as  the 
minimum  relative  cut. 

The  next  theorem  uses  the  extremal  characterization  of  the  mean  commute  times 
■  presented  in  Appendix  D,  Lemma  11,  to  provide  bounds  on  the  mean  commute  times  in 
terms  of  minimum  relative  cut  values. 

Theorem  10.  Let  Q  =  (A^,  A)  be  the  social  network  graph  induced  by  the  social  network 
matrix  T  and  consider  a  Markov  chain  with  transition  matrix  T.  For  any  a,  6  G  ^f,  the 
mean  commute  time  between  a  and  b  satisfies 


n  n 

:        --    ■  ,■  <mab  +  m.ba<   , 

where  Cab  is  the  minimum  relative  cut  vahie  between  a  and  b  (cf.  Dchnition  4). 


(30) 


Proof.  For  the  lower  bound  we  exploit  the  extremal  characterization  of  the  mean  com- 
mute time  given  by  Eq.  (54)  in  Lemma  IL  For  any  S  C  J\f  containing  a  and  not 
containing  b,  pick  the  function  gs  as  follows: 


5s  (0 


0,  t  e  S: 

1,  otherwise. 
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The  function  g  is  feasible  for  the  maximization  problem  in  Eq.  (54).  Hence, 


VTiah  +  Vnha   > 


.  =    n['£Y.Z,{gsi2)-gs{j)y 


"  for  all  S  CN,  a  e  S,b^S. 


The  tightest  lower  bound  can  be  obtained  b}'  taking  the  largest  right-hand  side  in  the 
above  relation,  which  gives  the  desired  lower  bound. 

For  the  upper  bound,  similar  to  Proposition  2  in  Chapter  4  of  [2],  we  use  the  second 
characterization  of  the  mean  commute  time  presented  in  Lemma  11.  Note  that  any  unit 
flow  from  a  to  b  is  feasible  in  the  minimization  problem  in  Eq.  (55).  Max- flow  min-cut 
theorem  imphes  that  there  exists  a  flow  /  of  size  Cab  from  a  to  6  such  that  [/,*  [  <  T,j  for 
all  edges  {i,j}  G  A.  Therefore,  there  exists  a  unit  flow  /  ~  {f*/cab)  from  a  to  b  such 
that  l/ijl  <  Tij/Cab  for  all  edges  {i.j}.  By  deleting  flows  around  cycles  we  may  assume 
that 

5^'^^-'l-{  2,'     othe7wise.'  ^^^^ 

Therefore,  by  invoking  Lemma  11  from  Appendix  D,  we  obtain 


mab  +  mba<[Y.^^j)    Y.    Y'     ^     ~    Y.    I^'^l 

9 

n 
<     — , 

Cab 

where  the  last  inequahty  follows  from  (31).  D 

The  minimum  relative  cut  for  the  barbell  graph  in  Example  3  is  given  by  cut  Ci  with 
relative  cut  value  0(1).  An  alternative  relative  cut  between  the  forceful  and  influenced 
agents  that  partitions  the  left  bell  is  cut  C3,  which  has  relative  cut  value  0{n),  and 
therefore  yields  a  tighter  bound  on  the  mean  commute  times.  Comparing  cut  Ci  to  cut 
C3,  we  observe  that  C3  is  a  balanced  cut,  i.e.,  it  partitions  the  graph  into  parts  each  with 
a  fraction  of  the  total  number  of  nodes,  while  cut  Ci  is  not  balanced.  In  order  to  avoid 
unbalanced  cuts,  we  introduce  the  notion  of  a  normalized  relative  cut  between  two  nodes 
which  is  a  generalization  of  the  normalized  cut  presented  in  Definition  3. 

Definition  5.  Consider  a  Markov  chain  with  set  of  states  A^,  transition  probability 
matrix  Z,  and  stationary  distribution  tt.  The  minimum  normalized  relative  cut  value 
between  a  and  b,  denoted  by  Pab,  is  defined  as 


Pab  =  mt^  <     ,^,    ,„.,   \  a  e  b,b  f  b 
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where  Q{A,  B)  =  "^ZieAieB  '^i^^j^  ^"^  7r(5)  =  ^^,^5  tTj.  We  refer  to  the  cut  that  achieves 
the  minimum  in  this  optimization  problem  as  the  'minimum  normalized  relative  cut. 

The  next  theorem  provides  a  bound  on  the  mean  commute  time  between  two  nodes 
a  and  6  as  a  function  of  the  minimum  normahzed  relative  cut  value  between  a  and  b. 

Theorem  11.  Consider  a  Markov  chain  with  set  of  states  A/",  transition  probability 
matrix  Z,  and  uniform  stationary  distribution.  For  any  a,  6  G  A/",  we  have 

Snlogn 
rriab  +  rriba  <  , 

Pab 

where  pab  is  the  minimum  normalized  relative  cut  value  between  a  and  b  (cf.  Definition 
5). 

Proof.  We  present  a  generalization  of  the  proof  of  Lemma  3  by  Aldous  and  Fill  [2],  for 
the  notion  of  normalized  relative  cuts.  The  proof  relies  on  the  characterization  of  the 
mean  commute  time  given  by  Lemma  11  in  Appendix  D.  For  a  function  0  <  g  <  1  with 
g{a)  =  0  and  g{b)  =  1,  order  the  nodes  as  a  =  1,  2, . . .  ,n  =  6  so  that  g  is  increasing. 
The  Dirichlet  form  (cf.  Definition  8)  can  be  written  as 

£{9,9)     =     X^X^7r,Z,fc(5(A:) -5(i)) 

i      k>i 

^  mm  ^^z,k{9u + 1)  -  9U)f 

I       k>i  i<j<fc 
n-1 

=  ^(50  +  i)-50))'(3(A,>i;-) 

n-l 

...    >    53(ffO-  +  l)-50-))W(A,>(/l^^),  (32) 

j=i 

where  Aj  =  {1,  2, . . . ,  j},  and  the  last  inequality  is  true  by  Definition  5.  On  the  other 
hand,  we  have  '     .  '  '      :"  '    '  '■  ' 

n-l 

■  ■       1  =  g{b)  -  g{a)  =  J]  (g(j  +  1)  -  gij)){p,,niAM-W  (P-^^i^^M^'j))''  ■ 
Using  the  Cauclw-Schwartz  inequality  and  Eq.  (32),  we  obtain 


£{g.9)  ~  Po6^7r(.4J^(^p' 


But  7t{Aj)  =  j/n,  because  the  stationary  distribution  of  the  Markov  chain  is  uniform. 
Thus,  .  ■    ,    .     ,  , 


n  —  l  -  n  —  1  o 

1  v-^  71-^ 


T.^i^{AM^)-T.^W^)^''''°^''- 
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Therefore,  by  applying  the  above  relation  to  Eq.  (33)  we  conclude 

1  3n  log  n 

< 


£{g,9)         pab 

The  above  relation  is  valid  for  every  function  g  feasible  for  the  maximization  problem 
in  Eq.  (54).  Hence,  the  desired  result  follows  from  the  extremal  characterization  of  the 
mean  commute  time  given  by  Lemma  11.  '  '      ,  D 

Note  that  the  minimum  normalized  cut  value  of  a  Markov  chain  in  Definition  3  can 
be  related  to  normalized  relative  cut  values  as  follows; 

Therefore,  the  upper  bound  given  in  Theorem  1 1  for  the  mean  commute  time  is  always 
tighter  than  that  provided  in  Lemma  3. 

Let  us  now  examine  our  new  characterization  in  the  context  of  Example  3.  The 
minimum  normalized  relative  cut  is  given  by  cut  C2  with  (normalized  relative  cut)  value 
0(1).  Despite  the  fact  that  C2  is  a  balanced  cut  with  respect  to  the  entire  graph,  it 
is  not  a  balanced  cut  in  the  left  bell.  Therefore,  it  yields  a  worse  upper  bound  on 
mean  commute  times  compared  to  cut  C3  [which  has  value  0{n)].  These  considerations 
motivate  us  to  consider  balanced  cuts  within  subsets  of  the  original  graph.  In  the 
following  we  obtain  tighter  bounds  on  the  mean  commute  times  by  considering  relative 
cuts  in  a  subset  of  the  original  graph. 

Definition  6.  Consider  a  weighted  undirected  graph,  {.\f,A),  with  edge  {?, j}  weight 
given  by  w,j.  For  any  S  C  A^,  we  define  the  subgraph  of  {M,A)  with  respect  to  S  as 
a  weighted  undirected  graph,  denoted  by  {S,As),  where  As  contains  all  edges  of  the 
original  graph  connecting  nodes  in  S  with  the  following  weights 

The  next  lemma  uses  the  Monotonicity  Law  presented  in  Appendix  D,  Lemma  12  to 
relate  the  mean  commute  times  within  a  subgraph  to  the  mean  commute  times  of  the 
original  graph. 

Lemma  4.  Let  G  =  (^V,  .4)  be  an  undirected  graph  with  edge  {i,j}  weight  given  by 

Wij.  Consider  a  Markov  chain  induced  by  this  graph  and  denote  the  mean  first  passage 

times  between  states  i  and  j  by  ?ri,j.   We  fix  nodes  a,  b  E.  N,  and  S  C  N  containing  a 

and  b.    Consider  a  subgraph  of  (A'^,  A)  with  respect  to  5  (cf.   Definition  6)  and  let  fh.,j 

denote  the  mean  first  passage  time  between  states  i  and  j  for  the  Markov  chain  induced 

by  this  subgraph.  We  have, 

w     .  _     , 

mab  +  mia  <  —-^{m.ab  +  riiba), 
w(S) 

where  w  is  the  total  edge  weight  of  the  original  graph,  and  w{S)  is  the  total  edge  weight 
of  the  subgraph,  i.e.,  w{S)  =  X],^^  I2ja,Kf^v- 
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Proof.  Consider  an  undirected  graph  (A/",  A)  with  modified  edge  weights  Wij  given  by 

w^j,  i^  j  e  S,  or  ly^  j  e  S"  ; 

0,  i£S,jeS'; 

'^ii  +  J^keS^^'^k,    ^  =  j- 

Hence,  Wij  <  Wij  for  all  i  "^  j,  but  the  total  edge  weight  w  remains  unchanged.  By 
Monotonicity  Law  (cf.  Lemma  12),  the  mean  commute  time  in  the  original  graph  is 
bounded  by  that  of  the  modified  graph,  i.e., 

rnab  +  rriba  <rhab  + fhha-  (34) 

The  mean  commute  time  in  the  modified  graph  can  be  characterized  using  Lemma 
11  in  terms  of  the  Dirichlet  form  defined  in  Definition  8.  In  particular. 


{rhab  +  rhba)   ^    =    ^ini^^  <^--  J2  w^j{g{i)  -  g{j))' :   g{a)  =  0,g{b)  =  ij 
=    o<]<i{^S^'^^^^')"^^^'))'^   5(a)  =  0,9(6)  =  l} 


.      -  .     ■  ^  W       0<g<l   I   -^  w(S) 

'     ■  ■  '   •■  'jes  ,      ^  ,-,.■... 

'  ■  '    '   ■  ■    ■■     J.  '■,  wis)  ,  _  -       s-l  '   ■ 

=     [mab  +  'mba)     , 

w 

where  the  second  equality  holds  by  definition  of  w,  and  the  last  equality  is  given  by 
definition  of  w,  and  the  extremal  characterization  of  the  mean  commute  time  in  the 
subgraph.  The  desired  result  is  established  by  combining  the  above  relation  with  (34). 

Theorem  12.  Let  Q  =  {N".  A)  be  the  social  network  graph  induced  by  the  social  network 
matrix  T  and  consider  a  A4arkov  chain  with  transition  matrix  T.  For  any  a.b  £  M,  and 
any  S  C  Af  containing  a  and  b,  we  have  .  ■ 

3nlog|5|        :  . 

mab  +  m.ba<  -7;^, 

Pab[S)  ■ 

where  PabiS)  is  the  mmim.um  normalized  cut  value  between  a  and  b  on  the  subgraph  of 
{J\f,A)  with  respect  to  S,  i.e.,  -  .  '  ■ 

-      ■'■  t  c\        ■    e  ici^'SS',j€5\S' -^u  ■  fr,c,\ 

p..(5)=  mfjg|    ^s'\.\S\S'\    ■  ^^^y 

37 


Proof.  By  Lemma  4,  we  have      ,   ,:  •       :  ,    ,,  •    :      , 

w    ,  _  _     ,  '        ■' 

w(S) 

=       Tn>('n'^ab  +  rh.ba),  (36) 

where  ntab  is  the  mean  first  passage  time  on  the  subgraph  {S,As)- 

On  the  other  hand,  Definition  6  impfies  that  for  the  subgraph  (5,^45),  we  have  for 
every  i  e  S 

k€S  l''eS\{i]  fcGA"  freA" 

Hence,  the  stationary  distribution  of  the  Markov  chain  on  the  subgraph  is  uniform. 
Therefore,  we  can  apply  Lemma  11  to  relate  the  mean  commute  time  within  the  subgraph 
{S,As)  to  its  normalized  relative  cuts,  i.e., 

.       ■  _  3|5|log|5i 

\    ,  ■■■-■■  rn.ab  +  rnba  < 


Pab{S) 

where  Pab{S)  is  the  mniimum  normalized  cut  between  a  and  b  given  by  Definition  5  on 
the  subgraph.  Since  the  stationary  distribution  of  the  random  walk  on  the  subgraph  is 
uniform,  we  can  rewrite  Pab{S)  as  in  (35).  Combining  the  above  inequality  with  Eq.  (3G) 
establishes  the  theorem.  D 

Theorem  12  states  that  if  the  local  neighborhood  around  the  forceful  links  are  highly 
connected,  the  mean  commute  times  between  the  forceful  and  the  influenced  agents 
will  be  small,  implying  a  smaller  excess  influence  for  all  agents,  hence  limited  spread  of 
misinformation  in  the  society.  The  economic  intuition  for  this  result  is  similar  to  that  for 
our  main  characterization  theorems:  forceful  agents  get  (their  limited)  information  intake 
from  their  local  neighborhoods.  When  these  local  neighborhoods  are  also  connected 
to  the  rest  of  the  network,  forceful  agents  will  be  indirectly  influenced  by  the  rest  of 
the  society  and  this  will  limit  the  spread  of  their  (potentially  extreme)  opinions.  In 
contrast,  when  their  local  neighborhoods  obtain  most  of  their  information  from  the 
forceful  agents,  the  opinions  of  these  forceful  agents  will  be  reinforced  (rather  than 
moderated)  and  this  can  significantly  increase  their  excess  influence  and  the  potential 
spread  of  misinformation. 

Let  us  revisit  Example  3,  and  apply  the  result  of  Theorem  12  where  the  selected 
subgraph  is  the  left  cluster  of  nodes.  The  left  beU  is  approximately  a  complete  graph. 
We  observe  that  the  minimum  normalized  cut  in  the  subgraph  would  be  of  the  form  of 
Cs  in  Figure  2,  and  hence  the  upper  bound  on  the  mean  commute  time  between  i  and  j 
is  0(n  log  n),  which  is  close  to  the  mean  commute  time  on  a  complete  graph  of  size  n. 

Note  that  it  is  possible  to  obtain  the  tightest  upper  bound  on  mean  commute  time 
between  two  nodes  by  minimizing  the  bound  in  Theorem  12  over  all  subgraphs  S  of  the 
social  network  graph.    However,  exhaustive  search  over  all  subgraphs  is  not  appealing 
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from  a  computational  point  of  view.  Intuitively,  for  any  two  particular  nodes,  the  goal 
is  to  identify  whether  such  nodes  are  highly  connected  by  identifying  a  cluster  of  nodes 
containing  them,  or  a  bottleneck  that  separates  them.  In  the  following  section  we  present 
a  hierarchical  clustering  method  to  obtain  such  a  cluster  using  a  recursive  approach. 

5.4.1      Graph  Clustering 

We  next  present  a  graph  clustering  method  to  provide  tighter  bounds  on  the  mean 
commute  time  between  two  nodes  a  and  h  by  systematically  searching  over  subgraphs  S 
of  the  social  network  graph  that  would  yield  improved  normalized  cut  values.  The  goal 
of  this  exercise  is  again  to  improve  the  bounds  on  the  term  [m-ik  —  'm-jk)  in  Eq.  (22)  in 
Theorem  7. 

The  following  algorithm  is  based  on  successive  graph  cutting  using  the  notion  of 
minimum  normalized  cut  value  defined  in  Definition  3.  This  approach  is  similar  to 
the  graph  partitioning  approach  of  Shi  and  Malik  [34]  applied  to  image  segmentation 
problems. 

Algorithm  1.  Fix  nodes  a,  b  on  the  social  network  graph  {J\f ,  A).  Perform  the  following 
steps: 

1,  k  =  0.  Sk  =  A^. 

2.  Define  pk  as 


Pk  =   inf  \Sk 


Z-^iesjeSk\s  ^v 


with  SI  as  an  optimal  solution.  '         '  ' 

3.  If  a,  be  S;,  then  Sk+i  =  Sl;  k^  k  +  1;  Goto  2.  , .'  ' ,  :  . 

I  If  a, be  Sk  \  SI,  then  Sk+i  =  Sk  \  SI;  k^k  +  l;  Goto  g.  ,  . ,  . ,     /  , . 

5,  Return  3lLl££^.                                                                        /  "    -    •    ^ 

Figure  3  illustrates  the  steps  of  Algorithm  1  for  a  highly  clustered  graph.  Each  of  the 
regions  in  Figure  3  demonstrate  a  highly  connected  subgraph.  We  observe  that  the  global 
cut  given  by  Si  does  not  separate  a  and  6,  so  it  need  not  give  a  tight  characterization  of 
the  bottleneck  between  a  and  b.  Nevertheless,  S]  gives  a  better  estimate  of  the  cluster 
containing  a  and  h.  Repeating  the  above  steps,  the  cluster  size  reduces  until  we  obtain 
a  normalized  cut  separating  a  and  b.  By  Theorem  12,  this  cut  provides  a  bound  on 
the  mean  commute  time  between  a  and  b  that  characterizes  the  bottleneck  between 
such  nodes.  So  far,  we  have  seen  in  this  example  and  Example  2  that  graph  clustering 
via  recursive  partitioning  can  monotonically  improve  upon  the  bounds  on  the  excess 
influence  (cf.  Theorem  12).  Unfortunately,  that  is  not  always  the  case  as  discussed  in 
the  following  example.  In  fact,  we  need  further  assumptions  on  the  graph  in  order  to 
obtain  monotone  improvement  via  graph  clustering. 
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Figure  3:    Graph  clustering  algorithm  via  successive  graph  cutting  using  normalized 
mininnmi  cut  criterion. 


Figure  4:  Social  network  graph  with  a  central  hub 
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Example  4.  Consider  a  social  network  graph  of  size  n  depicted  in  Figure  4.  The  central 
region  is  a  complete  graph  of  size  n/2.  Each  of  the  k  clusters  on  the  cycle  is  a  complete 
graph  of  size  n/(2/c),  which  is  connected  to  the  central  hub  via  edges  of  total  weight  h. 
Moreover,  the  clusters  on  the  cycle  are  connected  with  total  edge  weight  r. 

If  r  >  kh/8,  then  Co  would  be  the  minimum  normalized  cut  rather  than  cuts  of  the 
form  C\.  Hence,  po  in  step  2  of  Algorithm  1  is  given  by 

kh        Akh 

2       2  " 

After  removing  the  central  cluster,  we  obtain  C2  as  the  minimum  normalized  cut 
over  the  cycle,  with  the  following  value 


Pi 


16r 


2  -  •  "         n 

"44  " 


Therefore,  we  conclude  that  pi  <  po  if  and  only  if  ^  <  r  <  ^,  i.e.,  the  upperbound 
obtained  by  Algorithm  1  on  the  mean  commute  time  between  a  and  b,  is  not  smaller  than 
that  of  Lemma  3.  That  is  because  by  removing  the  central  cluster,  we  have  eliminated 
the  possibility  of  reaching  the  destination  via  shortcuts  of  the  central  hub,  and  the  only 
way  to  reach  the  destination  is  to  walk  through  the  cycle. 

Next,  we  show  that  the  bounds  given  by  Algorithm  1  are  monotonically  improving, 
if  the  successive  cuts  are  disjoint. 

Definition  7.  Consider  an  undirected  graph  (A^,  ^).  The  cuts  defined  by  51,52  Q  N" 
are  disjoint  with  respect  to  M  if 

.,      .  5{S^)r\5[S2)  =  0,         •  '^         ■ 

where 


5(5)  =  {{z,j}G^h:G5,j€5=}. 


Theorem  13.  Let  pk  and  Sk  be  generated  by  the  /c"*  iteration  of  running  Algorithm 
1  on  the  social  network  graph  (^V,  ^).  If  the  cuts  corresponding  to  Sk+\  and  Sk+2  are 
disjoint  with  respect  to  5^.,  then  pk+i  >  pk- 


Proof.  By  definition  of  pk  in  step  2  of  Algorithm  1 ,  we  have  for  Sk+2  Q  S\ 


k 


Pk  -  Pfcl-fc \ — ic   \   c r  -  I'^'^ITc 1 — ic   \   c T'  ^'^'l 

\Jk+l\  ■  \Jk  \  Jk+l\  \Jk+2\  ■  Pfc  \  ^^+21 

But  Sk+i  and  5^+2  are  disjoint  with  respect  to  Sk,  and  5^+2   C   Sk+i    C   Sk-     It  is 
straightforward  to  show  that 

[{t,j}  EA\ie  Sk+2,j  G  Sk  \  5,+i}  c  5iSk+i)  n  SiSk+2)  =  0, 

41 


which  impUes  '  •    ■ 

i€Sk+2d^Sk\Sk+2  i65t+2jeSfc\S)c+i  ■te5fc+2,jeS'(t-|.i\Sfc+2  tGSfc+2.jeSfc^i\Sfc+2 

Therefore,  by  combining  the  above  relation  with  (37)  and  the  definition  of  p^+ii  we 
obtain                                            .....  ,         ,  ,      . 


pk+\ 

Pk     .     ~        V  \Sk  +  2\  ■  |5a-  +  i  \  Sk+2\    J   V  l'S'/:  +  2!  ■  \Sk\  Sk  +  2 

_     \Sk+i\  ■  \Sk  \  Sk+2\  _  I'S'fc+ild-S'fc  \  Sk+i\  +  \Sk+i  \  Sk+2\ 
I'S'fcl  •  i5/,+i  \  5a-+2|       |5fc+i  \  5fc+2!(|5fe+i|  +  |5a.  \  5a-+i1 


(38) 


I'S'fc+i  \  Sk+2\J  V        I'S'/c+il 
^   '       >    1- 

where  (38)  holds  because  5^+2  C  5^+1  C  Sk,  and  the  last  inequality  is  true  because 
Sk+i  \  Sk+2  C  Sk+i,  and  5^+2  is  nonempty.  D 

6      Conclusions 

This  paper  analyzed  the  spread  of  misinformation  in  large  societies.  Our  analysis  is 
motivated  by  the  widespread  differences  in  beliefs  across  societies  and  more  explicitly, 
the  presence  of  many  societies  in  which  beliefs  that  appear  to  contradict  the  truth  can 
be  widely  held.  We  argued  that  the  possibility  that  such  misinformation  can  arise  and 
spread  is  the  manifestation  of  the  natural  tension  between  information  aggregation  and 
misinformation  spreading  in  the  society. 

We  modeled  a  society  as  a  social  network  of  agents  communicating  (meeting)  with 
each  other.  Each  individual  holds  a  belief  represented  by  a  scalar.  Individuals  meet 
pairwise  and  exchange  information,  which  is  modeled  as  both  individuals  adopting  the 
average  of  their  pre-meeting  beliefs.  When  all  individuals  engage  in  this  type  of  infor- 
mation exchange,  the  society  will  be  able  to  aggregate  the  initial  information  held  by  all 
individuals.  This  effective  information  aggregation  forms  the  benchmark  against  which 
we  compared  the  possible  spread  of  misinformation. 

Misinformation  is  introduced  by  allowing  some  agents  to  be  "forceful,"  meaning  that 
they  influence  the  beliefs  of  (some)  of  the  other  individuals  they  meet,  but  do  not 
change  their  own  opinion.  When  the  influence  of  forceful  agents  is  taken  into  account, 
this  defines  a  stochastic  process  for  belief  evolution,  and  our  analysis  exploited  the  fact 
that  this  stochastic  process  (Markov  chain)  can  be  decomposed  into  a  part  induced  by 
the  social  network  matrix  and  a  part  corresponding  to  the  influence  matrix. 

Under  the  assumption  that  even  forceful  agents  obtain  some  information  (however 
infrequent)  from  some  others,  we  first  show  that  beliefs  in  this  class  of  societies  converge 
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to  a  consensus  among  all  individuals  (under  some  additional  weak  regularity  conditions). 
This  consensus  value  is  a  random  variable,  and  the  bulk  of  our  analysis  characterizes  its 
behavior,  in  particular,  providing  bounds  on  how  much  this  consensus  can  differ  from 
the  efficient  information  aggregation  benchmark. 

We  presented  three  sets  of  results.  Our  first  set  of  results  quantify  the  extent  of  mis- 
information in  the  society  as  a  function  of  the  number  and  properties  of  forceful  agents 
and  the  mixing  properties  of  the  Markov  chain  induced  by  the  social  network  matrix. 
In  particular,  we  showed  that  social  network  matrices  with  large  second  eigenvalues,  or 
that  correspond  to  fast-mixing  graphs,  will  place  tight  bounds  on  the  extent  of  misinfor- 
mation. The  intuition  for  this  result  is  that  in  such  societies  individuals  that  ultimately 
have  some  influence  on  the  beliefs  of  forceful  agents  rapidly  inherit  the  beliefs  of  the  rest 
of  the  society  and  thus  the  beliefs  of  forceful  agents  ultimately  approach  to  those  of  the 
rest  of  the  society  and  cannot  have  a  large  impact  on  the  consensus  beliefs.  The  extreme 
example  is  provided  by  expander  graphs,  where,  when  the  number  and  the  impa,ct  of 
forceful  agents  is  finite,  the  extent  of  misinformation  becomes  arbitrarily  small  as  the 
size  of  the  society  becomes  large.  In  contrast,  the  worst  outcomes  are  obtained  when 
there  are  several  forceful  agents  and  forceful  agents  themselves  update  their  beliefs  only 
on  the  basis  of  information  they  obtain  from  individuals  most  likely  to  ha\'e  received 
their  own  information  previously  (i.e.,  when  the  graph  is  slow-mixing). 

Our  second  set  of  results  exploit  more  explicitly  the  location  of  forceful  agents  within 
a  social  network.  A  given  social  network  will  lead  to  very  different  types  of  limiting 
behavior  depending  on  the  context  in  which  the  forceful  agents  are  located.  We  provided 
a  tight  characterization  for  graphs  with  the  forceful  essential  edges,  that  is,  graphs 
representing  societies  in  which  a  forceful  agent  links  two  disconnected  clusters.  Such 
graphs  approximate  situations  in  which  forceful  agents,  such  as  media  outlets  or  political 
leaders,  themselves  obtain  all  of  their  information  from  a  small  group  of  other  individuals. 
The  interesting  and  striking  result  in  this  case  is  that  the  excess  influence  of  all  of  the 
members  of  the  small  group  are  the  same,  even  if  some  of  them  are  not  directly  linked 
to  forceful  agents.  We  then  extended  these  findings  to  more  general  societies  using  the 
notion  of  information  bottlenecks. 

Our  third  set  of  results  provide  new  efficient  graph  clustering  algorithms  for  comput- 
ing tighter  bounds  on  excess  influence. 

We  view  our  paper  as  a  first  attempt  in  quantifying  misinformation  in  society.  As 
such,  we  made  several  simplifying  assumptions  and  emphasized  the  characterization  re- 
sults to  apply  for  general  societies.  Many  areas  of  future  investigation  stem  from  this 
endeavor.  First,  it  is  important  to  consider  scenarios  in  which  learning  and  information 
updating  are,  at  least  partly,  Bayesian.  Our  non-Bayesian  framework  is  a  natural  start- 
ing point,  both  because  it  is  simpler  to  analyze  and  because  the  notion  of  misinformation 
is  more  difficult  to  introduce  in  Bayesian  .models.  Nevertheless,  game  theoretic  models 
of  communication  can  be  used  for  analyzing  situations  in  which  a  sender  may  explicitly 
try  to  mislead  one  or  several  receivers.  Second,  one  can  combine  a  model  of  communica- 
tion along  the  lines  of  our  setup  with  individuals  taking  actions  with  immediate  payoff 
consequences  and  also  updating  on  the  basis  of  their  payoffs.  Misinformation  will  then 
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have  short-run  payoff  consequences,  but  whether  it  will  persist  or  not  will  depend  on 
how  informative  payoffs  are  and  on  the  severity  of  its  short-run  payoff  consequences. 
Third,  it  would  be  useful  to  characterize  what  types  of  social  networks  are  more  robust 
to  the  introduction  of  misinformation  and  how  agents  might  use  simple  rules  in  order  to 
avoid  misinformation. 

Finally,  our  approach  implies  that  the  society  (social  network)  will  ultimately  reach 
a  consensus,  even  though  this  consensus  opinion  is  a  random  variable.  In  practice,  there 
are  widespread  differences  in  beliefs  in  almost  all  societies.  There  is  little  systematic 
analysis  of  such  differences  in  beliefs  in  the  literature  at  the  moment,  and  this  is  clearly 
an  important  and  challenging  area  for  future  research.  Our  framework  suggests  two 
fruitful  lines  of  research.  First,  although  a  stochastic  consensus  is  eventually  reached  in 
our  model,  convergence  can  be  very  slow.  Thus  characterizing  the  rate  of  convergence 
to  consensus  in  this  class  of  models  might  provide  insights  about  what  types  of  societies 
and  which  sets  of  issues  should  lead  to  such  belief  differences.  Second,  if  we  relax  the 
assumption  that  even  forceful  agents  necessarily  obtain  some  (albeit  limited)  information 
from  others,  thus  removing  the  "no  man  is  an  island"  feature,  then  it  can  be  shown  that 
the  society  will  generally  not  reach  a  consensus.  Nevertheless,  characterizing  differences 
in  opinions  in  this  case  is  difficult  and  requires  a  different  mathematical  approach.  We 
plan  to  investigate  this  issue  in  future  work. 
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Appendix  A 

Preliminary  Lemmas,  Sections  3  and  4 

This  appendix  presents  two  lemmas  that  will  be  used  in  proving  the  convergence  of 
agent  beliefs  (i.e.,  Theorem  1)  and  in  establishing  properties  of  the  social  network  matrix 
T  in  Appendix  C. 

The  first  lemma  provides  conditions  mider  which  a  nonnegative  n  x  n  matrix  M  is 
primitive,  i.e.,  there  exists  a  positive  integer  k  such  that  all  entries  of  the  /c'''  power  of 
M,  M^,  are  positive  (see  [33]).  The  lemma  also  provides  a  positive  uniform  lower  bound 
on  the  entries  of  the  matrix  M''  as  a  function  of  the  entries  of  M  and  the  properties 
of  the  graph  induced  by  the  positive  entries  of  matrix  M.  A  version  of  this  lemma  was 
established  in  [28].  We  omit  the  proof  here  since  it  is  not  directly  relevant  to  the  rest  of 
the  analysis. 

Lemma  5.  Let  H  he  &  nonnegative  n  x  n  matrix  that  satisfies  the  following  conditions: 

(a)  The  diagonal  entries  of  H  are  positive,  i.e.,  H^i  >  0  for  all  ?'. 

(b)  Let  S  denote  a  set  of  edges  such  that  the  graph  {J\f,  £)  is  connected.    For  all 
(i, j)  G  (f,  the  entry  Hij  is  positive,  i.e.,  £  C  {{i,j)  \  H,j  >  0}. 

Let  d  denote  the  maximum  shortest  path  length  between  any  i,  j  in  the  induced  graph 
{J\f,£),  and  /y  >  0  be  a  scalar  given  by  ■    .      ■. 


?7  =  min  <  min //,j,  min  H,. 

Then,  we  have 
::-V     '  ■   •      ■  -V     .    [H%,j  >  if         for  alH,j. 


The  second  lemma  considers  a  sequence  z{k)  generated  by  a  linear  time-varying 
update  rule,  i.e.,  given  some  ^(0),  the  sequence  {z{k)}  is  generated  by 

'       ^      ■    -         z{k)^  Hik)z{k-1)         for  all /t  >  0,  ■       ,.  .    '       ■ 

where  H{k)  is  a  stochastic  matrix  for  all  k  >  0.  We  introduce  the  matrices  ^{k,s)  = 
H{k)H{k  -  1) . . .  H{s)  to  relate  z{k  +  1)  to  z{s)  for  s  <  /c,  i.e., 

z{k+l)  =  ^{k,s)z{s). 

The  lemma  shows  that,  under  some  assumptions  on  the  entries  of  the  matrix  $(/c,  s),  the 
disagreement  in  the  components  of  z{k),  defined  as  the  difference  between  the  maximum 
and  minimum  components  oi  z{k),  decreases  with  A;  and  provides  a  bound  on  the  amount 
of  decrease. 
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Lemma  6.  Let  {H{k)}  be  a  sequence  of  n  x  n  stochastic  matrices.  Given  any  z{0)  6  M" , 
let  {^(/c)}  be  a  sequence  generated  by  the  hnear  update  rule 

,  .    z{k)  =  H{k)z{k  -  1)         for  ah  ^^  >  0.  (39) 

Assume  that  there  exists  some  integer  B  >  0  and  scalar  9  >  0  such  that 

[^{s  + B  -  l,s)],j  >9        for  alH,j,  and  s  >  0. 

For  all  k  >  0.  define  M{k)  €  R  and  'm{k)  €  K  as  follows:  :  . 

Af(fc)  =  max  2,(/c),  m{k)  =  mm  z,{k).      '  (40) 

Then,  for  all  s  >  0,  we  have  n9  <  1  and 

M{s  + B)-m{s  +  B)  <{l-7i9){Mis) -m{s)). 

Proof.  In  view  of  the  linear  update  rule  (39),  we  have  for  all  i, 

n 

2^(s  +  B)  =  ^^[^(s  +  B- l,s)],jZj(s)         for  ah  s  >  0. 
We  rewrite  the  preceding  relation  as  '  - 

T!  n 

J=l  J=l 

where  [<l>(s  +  B  —  l,s)],j  =   [$(s  +  5  -  l,s)]u  ~  ^  ^^^  ^■^^  '''J-    Since  by  assumption 
[$(5  +  5-1,  s)]ij  >  9  for  all  i,  j,  we  have 

[l>(s  +  S  -  l,s)]y  >  0         forall?,j. 

Moreover,  since  the  matrices  H{k)  are  stochastic,  the  product  matrix  <l>(s  +  S  —  1,  s)  is 
also  stochastic,  and  therefore  we  have 

n 

J2[^{s  +  B  -1,  s)],j  =  1  -  n9         for  all  i. 

From  the  preceding  two  relations,  we  obtain  1  —  716'  >  0  and 

77 
(1  -n9)m{s)  <  ^[<|)(s  +  B-  l,s)ljZj{k)  <  (1  -n9)M{s), 
j=i 
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where  m{s)  and  A4{s)  are  defined  in  Eq.  (40).  Combining  this  relation  witli  Eq.  (41), 
we  obtain  for  all  i 

n 

(1  -  ne)m.{s)  <z,{s  +  B)-Y^  ez,{s)  <  (1  -  n9)M{s). 
Since  this  relation  holds  for  all  i,  we  have 

n 

(1  -  ne)mis)  <m{s  +  B)-Y2  ^-jis), 

n 

M[s  +  B)  -Y^ezj{s)  <  (1  -nd)M{s), 

from  which  we  obtain 

M[s  +  B)-  m{s  +  B)  <{1-  7i0)(M(s)  ~  m{s))         for  all  s  >  0. 

n 

Appendix  B 

Properties  of  the  Mean  Interaction  and  Transition  Matrices, 
Sections  3  and  4 

We  establish  some  properties  of  the  mean  interaction  matrix  W  and  the  transition 
matrices  <&(fc,  s)  mider  the  assumptions  discussed  in  Section  2.2.  Recall  that  transition 
matrices  are  given  by      -  -  •      ■ 

^{k,s)  =  W{k)W{k-l)---W{s  +  l)W{s)         for  ah /t  and  s  with    k  >  s,        (42) 

with  $(/c,/c)  —  W{k)  for  all  k.  Also  note  that  the  mean  interaction  matrix  is  given  by 
W  =  E[W{k)]  for  all  k.  In  view  of  the  belief  update  model  (4)-(5),  the  entries  of  the 
matrix  W  can  be  written  as  follows.  For  all  i  E  A^,  the  diagonal  entries  are  given  by 


m. 


^   ^         T.J^^iPv+PJ^ 


n 


1 
n 


l3,. 


J2P'^  (^  +  °''J^  +  7^J  +  ^Pjr  (^  +  a  J,  +  7j, 


PJ^ 


iT^l 


Jt-i 


and  for  all  /  7^  j  £  M ,  the  ofi'-diagonal  entries  are  given  by 


m^  =  z 


n 


Pu   [^  +  (^^A'^  -  ^))   +Pj^-f 


(43) 


(44) 


Using  the  assumptions  of  Section  2.2,  Lemma  5,  and  the  explicit  expressions  for  the 
entries  of  the  matrix  W,  we  have  the  following  result. 
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Lemma  7.  Let  d  be  the  maximum  shortest  path  length  between  any  z,  j  in  the  graph 
{J\f,£)  [cf.  Eq.  (3)],  and  ry  be  a  scalar  given  by 


77  =  min  <^  min[W]i,,  min  [VV']„  ^  ,  ■      (45) 


[cf.  Eqs.  (43)  and  (11)]. 

(a)  The  scalar  ?/  is  positive  and  we  have 

-    -  ./''~     [W%j  >  T]''        for  alH,j. 

(b)  We  have  -     ,     ■        --       ■  -     ■      ■     •_ 


pU^{s  +  d-  l,s)]y  >  y|  >  y         for  ah  s  >  0,  i,  and  j. 


Proof,  (a)  We  show  that  under  Assumptions  1  and  3,  the  mean  interaction  matrix  W 
has  positive  diagonal  entries  and  the  set  £  [cf.  Eq.  (2)]  is  a  subset  of  the  link  set  induced 
by  the  positive  elements  of  W.  Together  with  the  Connectivity  assumption,  part  (a) 
then  follows  from  Lemma  5. 

By  Assumption  1,  we  have  for  all  ?',  Ylj^iPij  —  ^  ^^d  Pij  >  0  for  all  j.  This  implies 
that  J2,^,Pji  <  n  —  1  and  therefore  -■        - 


^         J2J^^iP^:+PT> 


>  0        for  all  i. 


Since  X],-i,Ptj  =  1  for  all  i,  there  exists  some  j  such  that  Ptj  >  0,  i.e.,  {i,j)  G  £.  Li  view 
of  the  information  exchange  model,  we  have  pij  >  0  or  a^j  >  0  or  7,^  >  0,  implying  that 

Combining  the  preceding  two  relations  with  Eq.  (43),  we  obtain 

[WU  >  0         for  all  ■/.  (46) 

We  next  show  that  for  any  link  {i.j)  in  the  set  £,  the  entry  [W]jj  is  positive,  i.e., 

£c{{i,j)  I  [inj>o}. 

For  any  (z,  j)  G  £",  we  have  p,j  >  0,  and  therefore  /3y  +  q-jj  >  0  (cf.  Assumption  3).  This 
imphes  that  .  -  ■  . 

p^.  (y  +  "u(i-0)  >o, 
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which  by  Eq.  (44)  jdelds  [W]ij  >  0.  Together  with  Eq.  (46),  this  shows  that  the  scalar 
■q  defined  in  (45)  is  positive.   By  Assumption  2,  the  graph  {J\f,£)  is  connected.   Using 
the  identification  H  =  W  in  Lemma  5,  we  see  that  the  conditions  of  this  lemma  are 
satisfied,  establishing  part  (a), 
(b)    For  all  i,j  and  s  >  0,  we  have 

''  d 

■q 


P{[*(s  +  d-l,s)],j  >  ^}     =     P{l-[*(s  +  a!-l,s)],,  <  1 

=     l-F{l-[<l>(5  +  d-l,s)],,>l-|^}.      (47) 


2 


The  Markov  Inequality  states  that  for  any  nonnegative  random  variable  Y  with  a  finite 
mean  E\Y\,  the  probability  that  the  outcome  of  the  random  variable  Y  exceeds  any 
given  scalar  5  >  Q  satisfies 

?{Y>5]<^. 

0 

By  applying  the  Markov  inequahty  to  the  random  variable  1  —  [$(s  +  d  —  1,  s)]ij  [which 
is  nonnegative  and  has  a  finite  expectation  in  view  of  the  stochasticity  of  the  matrix 
$(s  +  d  —  1,  s)  for  all  s  >  0],  we  obtain 

7?^  ^  E[l-ms  +  d-l,s)],,] 


P|l-[$(s  +  d-l,s)]„  >  1-y}  < 


1  -  'rj'^/2 
Combining  with  Eq.  (47),  this  yields 

7?^^^       E[l-ms  +  d-hs)] 


P{[$(s  +  d-l,s)]„>^}>l 


'Ji 


1  -  r/'^/2 

By  the  definition  of  the  transition  matrices  [cf.  Eq.  (42)],  we  have       •         :•  , 

E[^{s  +  d-l.  s]]  =  E[Wis  +  d~  l)W{s  +  d  -  2)  ■  ■  ■  Wis)]  =  IV^         '    "^ 

where  the  second  equality  follows  from  the  assumption  that  ]V{k)  is  independent  and 
identically  distributed  over  k.  By  part  (a),  this  implies  that 

.     .-       ■    [E[<^{s  +  d-l,s)]],j>ri'^        foralH,j,         ■   .  ■_ ,. 

which  combined  with  Eq.  (48)  yields 

V    ^  ''^-    2J   -  l-ryV2        1  -  7/V2  ~    2 

establishing  the  desired  result.  D 

The  next  two  lemmas  establish  properties  of  transition  matrices. 
Lemma  8. 
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(a)  [<I>(/c,  s)]n  >  e'^  *''"'  for  all  k  and  s  with  k  >  s,  and  all  i  £  J\f  with  probability  one. 

(b)  Assume  that  there  exist  integers  K,B  >  1  and  a  scalar  ^  >  0  such  that  for  some 
s  >  0  and  A:  €  {0, .  .  .  ,  K] ,  we  have 

[$(s  + (/c  +  1)B  -  Ls  +  /v:B)],j  >  ^         for  some  z,j. 

Then, 

[$(s  +  A'S  -  1,  s)]ij  >  ^e^"^         with  probabiUty  one. 

Proof,  (a)  We  let  s  be  arbitrary  and  prove  the  relation  by  induction  on  k.  By  the 
definition  of  the  transition  matrices  [cf.  Eq.  (42)],  we  have  ^{s,s)  =  W{s).  Thus,  the 
relation  [$(/i':,  s)]jj  >  e'^~''+^  holds  for  k  =  s  from  the  definition  of  the  update  matrix 
W{k)  [cf.  Eq.  (5)].  Suppose  now  that  the  relation  holds  for  some  k  >  s  and  consider 
[$(/c+  l,s)]„.  We  have 


[<^{k  +  l,s)]„  =  J2i^Vik  +  l)],,[$(fc,s)],,  >  [W{k  +  l)]„[$(fc,5)]„  > 


k-s-\-2 


/i=i 


where  the  first  inequahty  follows  from  the  nonnegativity  of  the  entries  of  $(A:,s),  and 
the  second  inequality  follows  from  the  inductive  hypothesis, 
(b)    For  any  s  >  0,  we  have 

n 

[$(s  +  KB  -  1,  s)],,    =    ^[$(5  +  KB^hs  +  {k  +  l)B)],ft  [$(s  +  (fc  +  1)5  -  1,  s)]nj 

>  [<^{s  + KB -l,s  +  ik  +  l)B)]„[<^{s  +  {k  +  l)B  ~l,s)],j 

>  f(^-^-»S[$(s  +  (fc+l)B-l,s)],,, 

where  the  last  inequality  follows  from  part  (a).  Similarly, 

n 

[$(s  +  {k  +  1)5-1,  s)],j     =    J][$(s  +  (fc  +  1)5  -  1,  s  +  A:5)],,[<l>(s  +  A:5  -  1,  s)]^, 


h=l 


>     [*(s+(/c  +  l)5-  l,s  +  A:5)],j[*(s  +  fc5-  l,s) 


UJ 


where  the  second  inequahty  follows  from  the  assumption  [$(s+(A:+l)5  — 1,  s  +  A:5)].j,  >  ^ 
and  part  (a).  Combining  the  preceding  two  relations  yields  the  desired  result.  D 

Lemma  9.  We  have 

pl[<P{s  +  n^d-  l,s)]y  >  ye"'"\  for  &\\i,j\  >  (\j  for  all  s  >  0, 

where  the  scalar  77  >  0  and  the  integer  d  are  the  constants  defined  in  Lemma  7. 
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Proof.  Consider  a  particular  ordering  of  the  elements  of  an  n  x  n  matrix  and  let  kij  e 
{0, ...  ,n'  —  1}  denote  the  unique  index  for  element  {i,j)-  From  Lemma  8(b),  we  have 

p{ms  +  n^d-l,s)],j>'^e"'-\  foralH,j} 

r  v'^ 

>  P<  [^{s  +  [k,j  +  l)(i  -  1,  s  +  hjd)\,j  >  -^,  for  all  i,] 

=  J]  P[^s  +  ih,  +  l)d  -  1,  s  +  hjd)],,  >  |-} 

Here  the  second  equality  follows  from  the  independence  of  the  random  events 

|*(s  +  (/c +  !)£/-  l,s  +  kd)],j  >  yj 
over  all  /c  =  0, . . .  ,n^  —  1,  and  the  last  inequality  follows  from  Lemma  7(b).  D 

Appendix  C 

Properties  of  the  Social  Network  Matrix,  Section  4 

The  next  lemma  studies  the  properties  of  the  social  network  matrix  T.  Note  that 
the  entries  of  the  matrix  T  can  be  written  as  follows:  For  all  i  G  J\f,  the  diagonal  entries 
are  given  by         i,i       ■,  ;  s^,.,-,  ■  ,..-■' 


[Th 


=  1 — — h- 


n 


n 


1  -  ft] 


+  Tu)  +J2pJ' 


1   -l]^ 


+  Ijz 


J#^ 


(49) 


and  for  all  i  y^  j  E  Af,  the  off-diagonal  entries  are  given  by 


[^1^.  =  Z 


l-lr 


+  Pj 


1  -  7jt 


(50) 


Lemma  10.  Let  T  be  the  social  network  matrix  [cf.  Eq.  (9)].  Then,  we  have: 


(a)  The  matrix  T'^  converges  to  a  stochastic  matrix  with  identical  rows  ^e  as  k  goes 
to  infinity,  i.e., 

lim  T'-'  =  -ee'. 

(b)  For  any  2(0)  G  R",  let  the  sequence  z{k)  be  generated  by  the  linear  update  rule 

■     ■        •  z{k)  =  Tz{k-  1)         for  all  A;  >  0.  .   ' 
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For  all  k  >  0,  define  M{k)  E  M.  and  m{k)  G  M  as  follows: 

M{k)  =  max  z,(k),  m{k)  =  min  Zi{k). 

Then,  for  all  /c  >  0,  we  have 

M{k)  -  m{k)  <  S''{M{Q)  -  m{0)). 
Here  (5  >  0  is  a  constant  given  by 


X 


mm 

(ij)es  I  71 


2  2 

and  d  is  the  maximum  shortest  path  length  in  the  graph  {J\f,8)  [cf.  Eq.  (3)]. 

Proof,   (a)    By  Assumption  1,  we  have  for  all  z,  '^,j,^Pij  =  1  and  p^j  >  0  for  all  j.  This 
implies  that  Xlj^i  Pji  ^  n  —  1  and  therefore 


J2■,^^iP^J  +  Pp) 


>  0 


for  all  i. 


(51) 


Since  X!,-iPu  ~  -^  ^"^^  ^^^  '■  there  exists  some  j  such  that  p,^  >  0,  i.e.,  (?',  j)  €  £".  By 
Assumption  3,  this  implies  that  /3,j  +  a,j  =  1  —  7jj  >  0,  showing  that  T^  >  0  for  all  i. 
Similarly,  for  any  [i.j]  G  £',  we  have  p,j  >  0  and  therefore  1  —  7.^  >  0.  showing  that 
T,j  >  0  for  all  (z,j)  6  £■.  Using  Eq.  (51)  in  Eqs.  (49)  and  (50),  it  follows  that  for  all  i 

[T]„  >  T,j         for  all  j. 

Thus,  we  can  use  Lemma  5  with  the  identification 


X  =  mm  <  - 

(..j)sf  In 


1  -  T.j,         1  -  7ji 
P^j  ^7^  +  Pp  ^TT- 


(52) 


and  obtain 


[T^.yx'        foran?-,j. 


(53) 

i.e.,  T  is  a  primitive  matrix  and  therefore  the  Markov  Chain  with  transition  probability 
matrix  T  is  regular.  It  follows  from  Theorem  3(a)  that  for  any  :(0)  £  IR",  we  have 

lim  r'-'z(O)  =  62, 

k—*oo 

where  z  is  given  by  z  =  7r'c(0)  for  some  probability  vector  tt.  Since  T  is  a  stochastic  and 
symmetric  matrix,  it  is  doubly  stochastic.  Denoting  z{k)  =  T''"c(0),  this  implies  that 
the  average  of  the  entries  of  the  vector  z{k)  is  the  same  for  all  k,  i.e., 


-y^  z,(k)  = -y^  z,{0)         for  all  A:  >  0. 
n  '^—^  n  ^-^ 

i=l  2=1 
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Combining  the  preceding  two  relations,  we  obtain 

lim  -J2zr{k)  =  z  =  ~Y^Zi{0), 
fc— CXI  n  ^-^  n  ^—^ 

establishing  the  desired  relation. 

(b)    In  view  of  Eq.  (53),  we  can  use  Lemma  6  with  the  identifications 

H{k)  =  T,    B  =  d,    e  =  x', 

where  x  is  defined  in  Eq.  (52),  and  obtain 

M{k)  -  m{k)  <  (1  -  nx'^)^(M(0)  -  m,(0)). 


D 


Appendix  D 

Characterization  of  the  Mean  Commute  Time,  Section  5 

First,  we  characterize  the  mean  commute  time  between  two  nodes  for  a  random  walk 
on  an  undirected  graph  using  Dirichlet  principle  and  its  dual,  Thompson's  principle. 

Definition  8.  Consider  a  random  walk  on  a  weighted  undirected  graph  {J\f,A)  with 
weight  Wjj  associated  to  each  edge  {?,  j}.  Define  the  Dirichlet  form  £,  as  follows.  For 
functions  ^  :  A/"  ^  M  write  -:' 

where  w  =  ^^ ,  w^j  is  the  total  edge  weight.  .    ■_    :>    / 

Lemma  11.  Consider  a  random  walk  on  a  weighted  undirected  graph  with  weight  Wij 
associated  to  each  edge  [i,]]-  For  mean  commute  time  between  distinct  nodes  a  and  b 
we  have, 

mab  +  rriba    =    ^^P  )  77 r  ^  0  <  5  <  1,  5(")  =  0,^(6)  =  1  I  (54) 

=    uj  inf  <  -  y^  — ^  :  /  is  a  unit  flow  from  a  to  6  > ,  (55) 

where  mah  is  the  mean  iirst  passage  time  from  a  to  6,  and  w  is  the  total  edge  weight. 
Proof.  See  Section  7.2  of  [2].  D 
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It  is  worth  mentioning  that  the  two  forms  of  the  mean  commute  time  characterization 
in  Lemma  11  are  dual  of  each  other.  The  first  form  is  a  coroUary  of  Dirichlet  principle, 
while  the  second  is  immediate  result  of  Thompson's  principle.  Using  the  electric  circuit 
analogy,  we  can  think  of  function  g{i)  as  potential  associated  to  node  i,  and  flow  /,j  as 
the  current  on  edge  {i,j}  with  resistance  :~.  The  expressions  in  (55)  are  equivalent 
descriptions  of  minimum  energy  dissipation  in  such  electric  network.  Hence,  we  can 
interpret  the  mean  commute  time  between  two  particular  nodes  as  the  effective  resistance 
between  such  nodes  in  a  resistive  network.  This  allows  us  to  use  Monotonicity  Law  to 
obtain  simpler  bounds  for  mean  commute  time. 

Lemma  12.   (Monotonicity  Law)  Let  Wij  <  u;,j  be  the  edge-weights  for  two  undi- 
rected graphs.  Then, 

ruav  +  rriya  <  (  —  )  {^av  +  rhya),      for  all  a,  v, 

where  w  =  J2ij  ^'jj  ^^d  w  —  "^^ij  'i^ij  are  the  total  edge  weight. 

Proof.  Let  /*  and  /*  be  the  optimal  solutions  of  (55)  for  the  original  and  modified 
graphs,  respectively.  We  can  write 


where  the  first  inecjuality  follows  from  optimality  of  /*,  and  feasibility  of  /'.  D 

By  the  electric  network  analogy,  Lemma  12  states  that  increasing  resistances  in  a 
circuit  increases  the  effective  resistance  between  any  two  nodes  in  the  network.  Mono- 
tonicity law  can  be  extremely  useful  in  providing  simple  bounds  for  mean  commute 
times. 
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